runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
213 stars 82 forks source link

Allow any vLLM engine args as env vars, Update vLLM, refactor #82

Closed alpayariyak closed 3 weeks ago

alpayariyak commented 1 month ago

Simplifying update process and making code cleaner by removing the hardcoding of engine arguments, defaults and env vars and instead matching env vars to vLLM's AsyncEngineArgs directly by keys

nerdylive123 commented 1 month ago

Yeah nice! using mapping, and can you update the VLLM base image, they have updated it 😊

alpayariyak commented 1 month ago

I've started a new position, so @pandyamarut will be taking over. left to do:

TimPietrusky commented 1 month ago

@alpayariyak thank you!

@pandyamarut welcome!

TimPietrusky commented 1 month ago

@pandyamarut Update: now it worked just by repeating the command again. So you can forget my other message :D

Previous message

I tried to build this image locally based on this branch, but got this error:

884.7 Collecting flashinfer
885.7   Downloading https://github.com/flashinfer-ai/flashinfer/releases/download/v0.1.1/flashinfer-0.1.1%2Bcu121torch2.3-cp310-cp310-linux_x86_64.whl (1262.5 MB)
1034.9      ━━━━━━━━━━━━━━━━━━━━━╸                   0.7/1.3 GB 4.1 MB/s eta 0:02:23
1035.5 ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
1035.5     flashinfer from https://github.com/flashinfer-ai/flashinfer/releases/download/v0.1.1/flashinfer-0.1.1%2Bcu121torch2.3-cp310-cp310-linux_x86_64.whl#sha256=90da45996eefaf82ff77a53c8dcb813415cddbcfc9981e20fcbc33660e019429:
1035.5         Expected sha256 90da45996eefaf82ff77a53c8dcb813415cddbcfc9981e20fcbc33660e019429
1035.5              Got        8e74fee1baf4e9e896c479f85844d31c2ee251fa4cf0d5b778a7d9aac9fca8c5
1035.5
------
Dockerfile:15
--------------------
  14 |     # Install vLLM (switching back to pip installs since issues that required building fork are fixed and space optimization is not as important since caching) and FlashInfer
  15 | >>> RUN python3 -m pip install vllm==0.5.3.post1 && \
  16 | >>>     python3 -m pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.3
  17 |
--------------------
ERROR: failed to solve: process "/bin/sh -c python3 -m pip install vllm==0.5.3.post1 &&     python3 -m pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.3" did not complete successfully: exit code: 1

I was using this command:

docker build -t runpod/worker-vllm:dev  --platform linux/amd64 .

I'm running this on Windows, but I'm not sure if this should affect anything.