Open Stargate256 opened 1 day ago
You need to add 6.0
to CUDA_SUPPORTED_ARCHS
in CMakeLists.txt
:
https://github.com/vllm-project/vllm/blob/855c8ae2c9a4085b1ebd66d9a978fb23f47f822c/CMakeLists.txt#L22-L23
Then you need to build from source manually with old device support refer to installation.
BTW, I think running LLM on P100 can't get expected performance, because I remember that it has some numeric issue about FP16: https://discuss.pytorch.org/t/cnn-fp16-slower-than-fp32-on-tesla-p100/12146/2
Your current environment
How would you like to use vllm
I want to serve 2 LLMs over openAI compatable API, but it doesn't work as I have 2 tesla P100, which have a computer capability lower than 7 and I don't know how to make it work with this
Before submitting a new issue...