Closed humza-sami closed 5 months ago
This is a bug of cupy
, and we plan to remove cupy
dependency with https://github.com/vllm-project/vllm/pull/3442 . Please stay tuned, or you can have a try using docker image built during CI of that PR, e.g. docker pull us-central1-docker.pkg.dev/vllm-405802/vllm-ci-test-repo/vllm-test:a3c2340ae36ce8ee782691d30111377eaf7ae6ce
.
Feedback is welcome!
@youkaichao Thanks. I will also give it a try as well.
I was having an issue with fastapi and uvicorn command. I have no idea what is relation of --reload
flag with cupy but that was causing the issue
I was using the following command:
uvicorn api.main:app --reload --port 8080 --host 0.0.0.0
But it was causing the issue. I removed --reload flag and now error is fixed. Working command is:
uvicorn api.main:app --port 8080 --host 0.0.0.0
Glad you figured it out. And do have a try on our new docker image and give feedback! This will give us confidence in the next step of removing cupy
.
I am facing an error when attempting to utilize multiple GPUs with a FastAPI backend. The error arises during the integration of the multiple GPU code into the FastAPI backend API. Interestingly, the same code functions correctly when executed independently for multiple GPUs. I am simply loading model after loading libraries.
LLM(model=model, max_model_len=16000, tensor_parallel_size=4)
TypeError: cannot pickle '_thread.lock' object
Error Traceback
How would you like to use vllm
No response