how to continue model training?

phamkhactu commented 9 months ago

Check before submitting issues

[X] Make sure to pull the latest code, as some issues and bugs have been fixed.
[X] Due to frequent dependency updates, please ensure you have followed the steps in our Wiki
[X] I have read the FAQ section AND searched for similar issues and did not find a similar problem or solution
[X] Third-party plugin issues - e.g., llama.cpp, text-generation-webui, LlamaChat, we recommend checking the corresponding project for solutions
[X] Model validity check - Be sure to check the model's SHA256.md. If the model is incorrect, we cannot guarantee its performance

Type of Issue

Model training and fine-tuning

Base Model

LLaMA-7B

Operating System

Linux

Describe your issue in detail

I train model using run_clm_pt_with_peft.py, but my machine shutdown suddenly, model had trained some step. Now I want to resume from checkpoint lora to continue training. I've read the readme, I not found anything.

Many thanks for your help.

Dependencies (must be provided for code-related issues)

No response

Execution logs or screenshots

No response

GokulNC-Sarvam commented 8 months ago

Hi @phamkhactu, how did you solve the problem?

phamkhactu commented 8 months ago

Hi @phamkhactu, how did you solve the problem?

Hi @GokulNC-Sarvam, I use trainer and I resume from checkpoint

    trainer.train(resume_from_checkpoint=resume_from_checkpoint)

ymcui / Chinese-LLaMA-Alpaca