求助：chatglm2 lora训练error：RuntimeError: Expected is_sm80 to be true, but got false.

yuanzhoulvpi2017 / zero_nlp

中文nlp解决方案(大模型、数据、模型、训练、推理)

MIT License

2.81k stars 351 forks source link

求助：chatglm2 lora训练error：RuntimeError: Expected is_sm80 to be true, but got false. #152

Closed thirttyyy closed 1 year ago

thirttyyy commented 1 year ago

你好，按照readme，transformers库已经更新到最新，报错信息如下：

RuntimeError: Expected is_sm80 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

用的是一张A40卡，应该如何修改呢？

thirttyyy commented 1 year ago

解决了，这应该是pytorch CUDA架构的原因，参考这个 issue https://github.com/pytorch/pytorch/issues/94883，将：

train_result = trainer.train(resume_from_checkpoint=checkpoint)

改为：

with torch.cuda.amp.autocast(enabled=True, dtype=torch.bfloat16) as autocast, torch.backends.cuda.sdp_kernel(
        enable_flash=False) as disable:
    train_result = trainer.train(resume_from_checkpoint=checkpoint)

训练完看看训练效果受不受影响。

jakeywu commented 10 months ago

https://github.com/pytorch/pytorch/issues/94883 pip install --force-reinstall --pre torch --index-url https://download.pytorch.org/whl/nightly/cu117 我更换了torch版本就可以了