potamides / AutomaTikZ

Text-Guided Synthesis of Scientific Vector Graphics with TikZ
Apache License 2.0
71 stars 3 forks source link

Can I continue training from a checkpoint? #11

Open JasonLLLLLLLLLLL opened 1 year ago

JasonLLLLLLLLLLL commented 1 year ago

It seems it can get the last checkpoint in train/llama.py. But the loss seems to start over again(at 1.6). It should be 0.2 at this checkpoint.

{'loss': 1.6927, 'learning_rate': 0.0003589922426773994, 'epoch': 24.06}                                                                                                                                                  
 38%|█████████████████████████████████████████████████████████████████████▌                                                                                                                  | 1549/4096 [25:45<11:16:38, 

or can I code like this in train/llama.py?

check_point="/output/checkpoint-1536"
trainer.train(resume_from_checkpoint=check_point)

sorry to bother you for those questions. I am new to LLM fitune. I hope I can get your answer.

potamides commented 1 year ago

Where does your checkpoint come from? If you start a fine-tune, checkpoints are created automatically and if you then interrupt the fine-tuning and start it again (with the same --output directory), training should be resumed from the latest checkpoint automatically.