vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
25.52k stars 3.7k forks source link

OpenAI Server issue when running on Apptainer (HPC) #3068

Open vishruth-v opened 6 months ago

vishruth-v commented 6 months ago

Hi I'm trying to run the openai compatible server on an HPC cluster (that has apptainer instead of docker). I converted the docker image to an apptainer .sif file.

However when I try to run the .sif file, I'm faced with this issue

/usr/bin/python3: Error while finding module specification for 'vllm.entrypoints.openai.api_server' (ModuleNotFoundError: No module named 'vllm')

I assume it's because vllm isn't installed in the container/image's python environment (which it should be if it's equivalent to the an image built with DockerFile).

New to the apptainer environment and would appreciate any help in getting this up in HPC clusters!

simon-mo commented 6 months ago

You might need to deal with the Python path issue. By default our docker image does not install vLLM into Python package. The server can only be ran in /workspace directory. This is not by design, rather a coincidence.

simon-mo commented 6 months ago

Please let us know whether this fixes thing

mcleish7 commented 5 months ago

Hi,

I am new to Apptainer and vllm, what is the /workspace directory please @simon-mo

Thanks

vishruth-v commented 5 months ago

Hi @simon-mo so sorry for the delay I missed your email as I was travelling. Thanks for the advice regarding path issue!

When I switched to running the container from inside the base vllm directory, the issue that I mentioned in the description is solved (i.e the vllm folder is found). However I now run into issues with other dependencies

ModuleNotFoundError: No module named 'prometheus_client'

I haven't been able to figure out why the requirements are not accessible. My assumption is that this could be another issue with the path of requirements.txt.

Additionally, as I'm running this in an HPC cluster, the base path /workspace that's used in the docker file/image is privileged and I don't have access to it. (I instead run from <personal_workspace_path>/vllm