Closed anakin87 closed 3 months ago
Thanks @anakin87! That's weird, I evaluate BF16 models all the time (like automerged models for example). Would you be able to reproduce this error with another BF16 model by any chance? Thanks a lot for the fix!
Thanks for the feedback. Thinking about it more, it is probably due to the fact that I used pytorch 2.2.0 for training. 🙂
Feel free to close the issue.
Cool! I added it to the troubleshooting section, it might be helpful. Thanks.
Hey... Thanks for the great work!
While trying to evaluate a BF16 model, I encountered an error in my runpod container:
"triu_tril_cuda_template" not implemented for 'BFloat16'
. (https://github.com/pytorch/pytorch/issues/101932)Switching the image from
runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04
torunpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04
fixed the issue.I'm reporting this for others who may have the same problem. I don't know if it might make sense to update the Colab notebook and use a newer image or it might reveal other problems.