Open Zxlan opened 1 year ago
推理请参考glm官方github的chatglm2-6b项目。 有问题需要沟通的话请在官方github的chatglm2-6b项目提issue,我都会努力解答。 本项目已不再维护,谢谢!
https://github.com/THUDM/ChatGLM2-6B/issues/572 hello,想请教下有lora多卡微调的demo吗~ 双卡微调的时候会报错:
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
Parameter at index 55 has been marked as ready twice. This means that multiple autograd engine hooks have fired for this particular parameter during this iteration. You can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print parameter names for further debugging.
No description provided.
我是在原chatglm2-6b仓库的ptuning.py基础上,加了lora微调训练模型的加载逻辑,后面的do_eval和do_predict复用原代码逻辑👁👁
elif model_args.lora_checkpoint is not None:
logger.info(" *** Lora Model Evaluation *** ")
lora_config = PeftConfig.from_pretrained(model_args.lora_checkpoint)
logger.info(" *** Lora Config *** ")
logger.info(f" config: {lora_config} ") # inference_mode=True
base_model = AutoModelForCausalLM.from_pretrained(lora_config.base_model_name_or_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(lora_config.base_model_name_or_path, trust_remote_code=True)
logger.info(" *** Base Model Config *** ")
logger.info(f" config: {base_model.config} ")
peft_model = PeftModel.from_pretrained(base_model, model_args.lora_checkpoint)
logger.info(" *** Lora Model Config *** ")
logger.info(f" config: {peft_model.config} ")
model = peft_model.merge_and_unload()
推理请参考glm官方github的chatglm2-6b项目。 有问题需要沟通的话请在官方github的chatglm2-6b项目提issue,我都会努力解答。 本项目已不再维护,谢谢!