Open Micla-SHL opened 1 month ago
might be a bug of pytorch, which i fixed recently: https://github.com/pytorch/pytorch/pull/123243
the solution is to explicitly define TORCH_CUDA_ARCH_LIST
environment variable, e.g. export TORCH_CUDA_ARCH_LIST=9.0a
might be a bug of pytorch, which i fixed recently: pytorch/pytorch#123243
the solution is to explicitly define
TORCH_CUDA_ARCH_LIST
environment variable, e.g.export TORCH_CUDA_ARCH_LIST=9.0a
Thanks, I now think it has nothing to do with vllm. My machine can run nvidia-smi, but torch.cuda.is_available()= False, so I reinstalled the graphics driver and it returned to normal.
I found that vllm should consume as many resources as possible: INFO 07-24 15:04:47 gpu_executor.py:84] # GPU blocks: 11937, # CPU blocks: 2048 This is inconsistent with the memory consumed when I run the same code on 4090. How can I set a limit when initializing it?
Anything you want to discuss about vllm.
I successfully installed vllm and reasoning deployment on a 4090, and now I have switched to A40, and the compilation and installation has been failing, saying that there is a problem with my cuda, but I think there is no problem, and I have also done some verification. Now I have no idea, so I am here to ask for help