meta-llama / llama-models

Utilities intended for use with Llama models.
Other
4.88k stars 838 forks source link

Unable to determine the device handle for GPU0000:17:00.0: Unknown Error #200

Open Fujiaoji opened 3 weeks ago

Fujiaoji commented 3 weeks ago

Hi, Teams, Thanks for sharing this great project. I meet a weird issue. I use 4 A30, python 3.12, Nvidia driver 560, fabric manager 560. The model I use is "meta-llama/Llama-3.2-11B-Vision-Instruct". I run the example from hugging face "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct". The interesting thing is that, when I terminate the code by "ctrl + c" when it is running, it will cause an issue "Unable to determine the device handle for GPU0000:17:00.0: Unknown Error" when I use nvidia-smi to check gpu, this command cannot be used and shows the former error. Then my cuda:0 can not be used....... The cuda:1, and 2 can work. I cannot find the issue. Do you have any idea? Thanks