Open yliu2702 opened 4 days ago
Could you plz provibe your running script? In gereral , this kind of issue occurs when CUDA has already been initialized
I will be assigned 2 GPU (or more if I required). Like n_devices = torch.cuda.device_count() if torch.cuda.is_available() else 1
, n_devices = 2. But I can't load vllm model. Can you explain more? Thank you!
torch.cuda.is_available()
leads to the error you mentioned above. You can remove it and try again
torch.cuda.get_device_capability('cuda:0')
should avoid too.
Your current environment
```text Your output of `python collect_env.py` here ```
Model Input Dumps
No response
🐛 Describe the bug
When I load LLM as:
llm = LLM( model= model_id, tokenizer= model_id, download_dir = cache_dir, dtype='half', tensor_parallel_size = 2, gpu_memory_utilization=0.75, enable_lora = False)
I get error as
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
I tried loading llama_3_8b using huggingface, llm generation can be completed using 2 GPU, I just try vllm to speed up the generation process. Can anyone help me with this error? Thanks a lot!Best, Yi Nov 30th, 2024
Before submitting a new issue...