shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
Apache License 2.0
3.24k stars 492 forks source link

在sft的过程中,保存的多个checkpoint,这些checkponit是否可以支持merge? #253

Closed rainfallLLF closed 11 months ago

rainfallLLF commented 11 months ago

如题,因为在sft的过程中可能时间耗时很长,是否支持中间过程的checkpoint进行merge?我尝试直接修改--peft_model_path参数,修改为直接的checkpoint目录,如: ./chatglm-sft/20231103/checkpoint-52000,发现报错,信息如下,能否指点一下,感谢! python merge_peft_adapter.py --model_type chatglm --base_model_name_or_path ./chatglm2-6b --peft_model_path ./chatglm-sft/20231103/checkpoint-52000 --output_dir merged-chatglm-sft/20231103 Namespace(model_type='chatglm', base_model_name_or_path='./chatglm2-6b', tokenizer_path=None, peft_model_path='./chatglm-sft/20231103/checkpoint-52000', resize_emb=False, output_dir='merged-chatglm-sft/20231103') Base model: ./chatglm2-6b LoRA model: ./chatglm-sft/20231103/checkpoint-52000 Loading LoRA for causal language model Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:11<00:00, 1.62s/it] Traceback (most recent call last): File "/opt/XXX/model/MedicalGPT/merge_peft_adapter.py", line 111, in <module> main() File "/opt/XXX/model/MedicalGPT/merge_peft_adapter.py", line 87, in main tokenizer = tokenizer_class.from_pretrained(peft_model_path, trust_remote_code=True) File "/opt/XXX/anaconda3/envs/MedicalGPT/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 701, in from_pretrained config = AutoConfig.from_pretrained( File "/opt/XXX/anaconda3/envs/MedicalGPT/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 1023, in from_pretrained config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs) File "/opt/XXX/anaconda3/envs/MedicalGPT/lib/python3.9/site-packages/transformers/configuration_utils.py", line 620, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs) File "/opt/XXX/anaconda3/envs/MedicalGPT/lib/python3.9/site-packages/transformers/configuration_utils.py", line 675, in _get_config_dict resolved_config_file = cached_file( File "/opt/XXX/anaconda3/envs/MedicalGPT/lib/python3.9/site-packages/transformers/utils/hub.py", line 400, in cached_file raise EnvironmentError( OSError: ./chatglm-sft/20231103/checkpoint-52000 does not appear to have a file named config.json. Checkout 'https://huggingface.co/./chatglm-sft/20231103/checkpoint-52000/None' for available files.

tszslovewanpu commented 11 months ago

可以的吧,不过我生成的融合模型是.safetensors,不能在后续训练中加载啊。。。

shibing624 commented 11 months ago

指定tokenizer path 为base model 就可以。

rainfallLLF commented 11 months ago

指定tokenizer path 为base model 就可以。

已解决,感谢!