Open pretidav opened 1 month ago
Unfortunately we have not written a straightforward script for this. As a workaround, you can launch training with the hf checkpointer callback enabled, for 1 batch, with a very small learning rate, or modify train.py to just call the callback's save checkpoint function directly and not train.
I was wondering if there was a straightforward way to convert from sharded to monolithic checkpoint for a subsequent conversion to hf format (not a direct conversion sharded -> hf). I've read you can define a monolithic callback saver, however I would like to use some "off training" way, simply reading and writing the checkpoint in the now format.
Thanks for all the answers.