shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
Apache License 2.0
2.94k stars 452 forks source link

训练完之后保存的时候出现无法连接,requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443 #333

Closed josangmi closed 4 months ago

josangmi commented 4 months ago

可能和执行变量 --model_name_or_path merged-pt 有关系 ,这样的写法在colab 执行没问题,但是本地服务器就出现下面的错。 这样的写法 --model_name_or_path ./merged-pt 也报同样的错误。


INFO | main:main:1448 - Saving model checkpoint to outputs-sft-v1 Traceback (most recent call last):

requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /merged-pt/resolve/main/config.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f171ff45490>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: d5fc6bab-7884-4ecc-b86e-d800aba40eb9)')


然后我把merged-pt 都改成merged-pt1 之后不报错了,但是有一个警告:明明目录下面有这个config Saving model checkpoint to outputs-sft-v1 /usr/local/conda/envs/llms-1/lib/python3.11/site-packages/peft/utils/save_and_load.py:148: UserWarning: Could not find a config file in ./merged-pt1 - will assume that the vocabulary was not modified. warnings.warn(

会不会和 image有关

josangmi commented 4 months ago

然后dpo training的时候 用merge-sft 也是同样的,为啥会去连huggingface,这个目录明显是本地的。 requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /merged-sft/resolve/main/config.json

shibing624 commented 4 months ago

可以看下transformers官方加载模型的代码,如果是本地模型,可以强制指定为local model。