yangzhipeng1108 / DeepSpeed-Chat-ChatGLM

42 stars 7 forks source link

能share下几个依赖包的版本吗? #2

Closed EthenZhang closed 1 year ago

EthenZhang commented 1 year ago

哈喽,能share下几个依赖包的版本吗? 我train ppo的时候单机4卡的A100 在加载模型的时候报NotImplementedError: Cannot copy out of meta tensor; no data!的错误,怀疑是包的版本有问题,谢谢~

yangzhipeng1108 commented 1 year ago

https://github.com/microsoft/DeepSpeedExamples/issues/509 这个问题还没解决 不知道是不是chatglm-6b不支持step3

yangzhipeng1108 commented 1 year ago

https://github.com/microsoft/DeepSpeedExamples/issues/509 这个问题还没解决 不知道是不是chatglm-6b不支持step3

EthenZhang commented 1 year ago

可以看下这个,https://github.com/THUDM/ChatGLM-6B/issues/530 ,下最新的chatglm模型,我昨天试了下可以加载。另外像问下,zero stage2 的时候,语料tokenizer慢,会报[E ProcessGroupNCCL.cpp:737] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=3, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1806462 milliseconds before timing out. 超时的错误,知道有啥方法解决么?

yangzhipeng1108 commented 1 year ago

export CUDA_HOME=/usr/local/cuda-11.3/ export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 可以考虑申明这两个