vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.59k stars 3.9k forks source link

[Usage]: Mutiple-GPU usage with Fastapi and uvicorn not working #3612

Closed humza-sami closed 5 months ago

humza-sami commented 5 months ago

I am facing an error when attempting to utilize multiple GPUs with a FastAPI backend. The error arises during the integration of the multiple GPU code into the FastAPI backend API. Interestingly, the same code functions correctly when executed independently for multiple GPUs. I am simply loading model after loading libraries.

LLM(model=model, max_model_len=16000, tensor_parallel_size=4)

TypeError: cannot pickle '_thread.lock' object

Error Traceback

    self.llm = LLM(model=model, max_model_len=16000, tensor_parallel_size=4)
  File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/llm.py", line 109, in __init__
    self.llm_engine = LLMEngine.from_engine_args(engine_args)
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 391, in from_engine_args
    engine = cls(*engine_configs,
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 126, in __init__
    self._init_workers_ray(placement_group)
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 304, in _init_workers_ray
    self._run_workers("init_model",
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 1041, in _run_workers
    driver_worker_output = getattr(self.driver_worker,
  File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 94, in init_model
    init_distributed_environment(self.parallel_config, self.rank,
  File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 275, in init_distributed_environment
    cupy_utils.init_process_group(
  File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/parallel_utils/cupy_utils.py", line 90, in init_process_group
    _NCCL_BACKEND = NCCLBackendWithBFloat16(world_size, rank, host, port)
  File "/usr/local/lib/python3.8/dist-packages/cupyx/distributed/_nccl_comm.py", line 70, in __init__
    self._init_with_tcp_store(n_devices, rank, host, port)
  File "/usr/local/lib/python3.8/dist-packages/cupyx/distributed/_nccl_comm.py", line 88, in _init_with_tcp_store
    self._store.run(host, port)
  File "/usr/local/lib/python3.8/dist-packages/cupyx/distributed/_store.py", line 100, in run
    p.start()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_thread.lock' object

How would you like to use vllm

No response

youkaichao commented 5 months ago

This is a bug of cupy, and we plan to remove cupy dependency with https://github.com/vllm-project/vllm/pull/3442 . Please stay tuned, or you can have a try using docker image built during CI of that PR, e.g. docker pull us-central1-docker.pkg.dev/vllm-405802/vllm-ci-test-repo/vllm-test:a3c2340ae36ce8ee782691d30111377eaf7ae6ce .

Feedback is welcome!

humza-sami commented 5 months ago

@youkaichao Thanks. I will also give it a try as well. I was having an issue with fastapi and uvicorn command. I have no idea what is relation of --reload flag with cupy but that was causing the issue I was using the following command:

 uvicorn api.main:app --reload --port 8080 --host 0.0.0.0

But it was causing the issue. I removed --reload flag and now error is fixed. Working command is:

uvicorn api.main:app --port 8080 --host 0.0.0.0
youkaichao commented 5 months ago

Glad you figured it out. And do have a try on our new docker image and give feedback! This will give us confidence in the next step of removing cupy.