mosaicml / llm-foundry

LLM training code for Databricks foundation models
https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
Apache License 2.0
3.99k stars 525 forks source link

Add a config arg to just save an hf checkpoint #1335

Closed dakinggg closed 3 months ago

dakinggg commented 3 months ago

We have an offline conversion script from composer to hf checkpoint format, but it only works for monolithic checkpoints, and has some other components hardcoded. We are also working on improved utilities for working with checkpoints in composer, but as a stopgap, we are introducing an arg here that will run train.py, but just call the hf checkpointer save and then exit.

Test run: just-hf-4-nWWD05 (and confirmed the checkpoint shows up in object store)