runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
238 stars 96 forks source link

Add TORCH_CUDA_ARCH_LIST to Dockerfile #34

Closed bartlettD closed 9 months ago

bartlettD commented 9 months ago

Believe this fixes building docker images when a GPU is not present. Have successfully completed a docker build and deployed the Image to runpod.

The architectures should match the capabilities of the serverless workers but if I'm missing any let me know

Should close #25 too hopefully.

alpayariyak commented 9 months ago

This did not solve the issue on a non-gpu Mac unfortunately

bartlettD commented 9 months ago

That is unfortunate, I don't have a Mac to test with so can't follow this up any further.

alpayariyak commented 9 months ago

No worries, your effort is greatly appreciated! We are happy to report that we have found a workaround and will push the update with it soon!

alpayariyak commented 9 months ago

In the latest version, we have changed the base image to one that already has vLLM compiled, which solves this problem.