yuanzhoulvpi2017 / zero_nlp

中文nlp解决方案(大模型、数据、模型、训练、推理)
MIT License
2.81k stars 351 forks source link

求助:chatglm2 lora训练error:RuntimeError: Expected is_sm80 to be true, but got false. #152

Closed thirttyyy closed 1 year ago

thirttyyy commented 1 year ago

你好,按照readme,transformers库已经更新到最新,报错信息如下:

RuntimeError: Expected is_sm80 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

用的是一张A40卡,应该如何修改呢?

thirttyyy commented 1 year ago

解决了,这应该是pytorch CUDA架构的原因,参考这个 issue https://github.com/pytorch/pytorch/issues/94883,将

train_result = trainer.train(resume_from_checkpoint=checkpoint)

改为:

with torch.cuda.amp.autocast(enabled=True, dtype=torch.bfloat16) as autocast, torch.backends.cuda.sdp_kernel(
        enable_flash=False) as disable:
    train_result = trainer.train(resume_from_checkpoint=checkpoint)

训练完看看训练效果受不受影响。

jakeywu commented 10 months ago

https://github.com/pytorch/pytorch/issues/94883 pip install --force-reinstall --pre torch --index-url https://download.pytorch.org/whl/nightly/cu117 我更换了torch版本就可以了