Closed mcr-ksh closed 1 week ago
Although this doesn't solve the bug if you would like to get things working and disable vllm from trying to use your integrated Radeon Graphics you can set CUDA_VISIBLE_DEVICES=-1
. I tried setting --device=cpu
and it is working correctly for me.
+1 to this issue, seems the error is caused when you install vllm without cpu version. Currently attention backend is decided based on wether the installed version of vllm has cpu suffix or not (https://github.com/vllm-project/vllm/blob/main/vllm/attention/selector.py#L84 -> https://github.com/vllm-project/vllm/blob/main/vllm/utils.py#L131). This means that even when you specify device to be cpu
vllm tries to load from other attention backends.
https://github.com/vllm-project/vllm/pull/4962 is a potential solution (effectively passing down cpu attention backend flag from worker)
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!
Your current environment
🐛 Describe the bug
Starting using
--device 'cpu'
throws CUDA error. Should run on CPU anyways