nomic-ai / contrastors

Train Models Contrastively in Pytorch
Apache License 2.0
512 stars 37 forks source link

Unable to restart training from checkpoint. #17

Closed sandeep-krutrim closed 6 months ago

sandeep-krutrim commented 6 months ago

I am trying to do MLM pretraining. I provided the location of a checkpoint in mlm.yaml . The checkpoint contains config.json and model safetensors froma previous run. However the training script is not able to pick up the checkpoint and resume training. Any help would be appreciated.

zanussbaum commented 6 months ago

Can you provide any error logs and more details please? I can only guess what's happening without knowing more

zanussbaum commented 6 months ago

this should be fixed in #24