yanqiangmiffy / InstructGLM

ChatGLM-6B 指令学习|指令数据|Instruct
MIT License
654 stars 51 forks source link

Problems in train_deepspeed.py with ZeRO stage 1|2|3 #34

Open zjhJOJO opened 1 year ago

zjhJOJO commented 1 year ago

When running 'train_deepspeed.py' using your provided command, I noticed that the trainable parameter number is consistently zero. So, I performed a thorough code analysis and pinpointed the root cause of the error to be in Line 18 of 'insert_lora.py' that the program consistently always enters the exception block, which has been quite frustrating for me. I would greatly appreciate it if someone could assist me with this matter.