vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.54k stars 4.44k forks source link

[Installation]: container images - too big and need to publish also cpu versions #7609

Open yairyairyair opened 2 months ago

yairyairyair commented 2 months ago

Your current environment

not a problem in my own env

How you are installing vllm

docker pull vllm/vllm-openai:latest

notice that what i want is to have cpu version as well

docker pull vllm/vllm-openai:latest-cpu

also the regular (GPU) image is too big, it is 5GB, can we do something about it and make it smaller than < 1GB

If this project needs help with this specific issue i can help, i want these container optimizations

simon-mo commented 2 months ago

Optimization welcomed!

antoniomdk commented 2 months ago

Just installing torch and ray in an empty environment generates a docker image of ~2GiB. I think it's unrealistic to try to cut it down to < 1GiB. The CUDA libraries and other pre-compiled wheels probably make >50% of the total image size.

yairyairyair commented 2 months ago

If its 2GB its better than the 9GB which is published in the dockerhub, can we see why its 9gb and not 2? Maybe the github actions or something