runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
213 stars 82 forks source link

How can i update to vLLM v0.4.1 for llama3 support ? #66

Closed Lhemamou closed 2 months ago

Lhemamou commented 4 months ago

Hello everyone,

I would like to update the vLLM version to v0.4.1 in order to get access to LLAMA3 but i don't know how modify the fork runpod/vllm-fork-for-sls-worker. Could you please guide me ? Happy to help in some way!

nuckcrews commented 3 months ago

+1

nerdylive123 commented 3 months ago

+1, looking to figure this out soon

houmie commented 3 months ago

Same issue here. There is a blocking bug on Llama3 that has been fixed with v0.4.1.

arthrod commented 3 months ago

Pretty please

alpayariyak commented 3 months ago

Hi all, thank you for raising this issue! I have just merged the vLLM 0.4.2 update into main, you can use it by changing your Docker image in your endpoint from runpod/worker-vllm:stable-cudaX.X.X to runpod/worker-vllm:dev-cudaX.X.X. From my testing thus far, everything seems in order, but if you notice any issues, please let me know. After an initial test period, I'll release the update officially to replace the default stable images. Thanks all!

alpayariyak commented 2 months ago

merged into stable release