ymcui / Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki
Apache License 2.0
17.98k stars 1.84k forks source link

how to continue model training? #848

Closed phamkhactu closed 9 months ago

phamkhactu commented 9 months ago

Check before submitting issues

Type of Issue

Model training and fine-tuning

Base Model

LLaMA-7B

Operating System

Linux

Describe your issue in detail

I train model using run_clm_pt_with_peft.py, but my machine shutdown suddenly, model had trained some step. Now I want to resume from checkpoint lora to continue training. I've read the readme, I not found anything.

Many thanks for your help.

Dependencies (must be provided for code-related issues)

No response

Execution logs or screenshots

No response

GokulNC-Sarvam commented 8 months ago

Hi @phamkhactu, how did you solve the problem?

phamkhactu commented 8 months ago

Hi @phamkhactu, how did you solve the problem?

Hi @GokulNC-Sarvam, I use trainer and I resume from checkpoint

    trainer.train(resume_from_checkpoint=resume_from_checkpoint)