Closed ergleb78 closed 2 months ago
v100 does not support flash attention.
It seems like num-scheduler-steps need flash attention from the error message.
Try remove --num-scheduler-steps 8
v100 does not support flash attention. It seems like num-scheduler-steps need flash attention from the error message. Try remove
--num-scheduler-steps 8
@Huarong thanks a lot! It worked perfectly fine. I could not figure out where flash attention was coming from. Appreciate your help!
🐛 Describe the bug
I'm trying to tun Llama 3.1-8b-instruct on V-100 (Volta). Model loads fine with the following settings:
Any chat completion request crashes:
Before submitting a new issue...