Open orderer0001 opened 4 months ago
you can use --pipeline-parallel-size 3
, see https://docs.vllm.ai/en/latest/serving/distributed_serving.html
you can use
--pipeline-parallel-size 3
, see https://docs.vllm.ai/en/latest/serving/distributed_serving.html Thank you for your guidance. Should I set the parameterpipeline-parallel-size
to 3? Shouldtensor_parallel_size
also be set to 3?
to be specific, it is --pipeline-parallel-size 3 --tensor_parallel_size 1
, the latter can be omitted as it is the default.
pipeline-parallel-size After setting parameters, an error is reported: Traceback (most recent call last): File "
", line 1, in File "/data/logs/drone-exec/envir6767/model_lib.py", line 263, in load_with_engine self.engine = LLMEngine.from_engine_args(EngineArgs.from_cli_args(self.args)) File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 385, in from_engine_args engine_config = engine_args.create_engine_config() File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 670, in create_engine_config parallel_config = ParallelConfig( File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/config.py", line 698, in init self._verify_args() File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/config.py", line 704, in _verify_args raise NotImplementedError("Pipeline parallelism is not supported " NotImplementedError: Pipeline parallelism is not supported yet with multiprocessing.
Related code: if 'gemma' in self.name.lower(): print("模型为gemma") self.args.pipeline_parallel_size =3 self.args.tensor_parallel_size =1 self.engine = LLMEngine.from_engine_args(EngineArgs.from_cli_args(self.args)) The above code is written into its own loading module.
Even if I follow the instructions exactly as in the document, it still won’t work.
from vllm import LLM llm = LLM('/data/big_model/gemma-2-27b-it', pipeline_parallel_size =3) INFO 07-20 13:05:45 config.py:695] Defaulting to use mp for distributed inference Traceback (most recent call last): File "
", line 1, in File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 150, in init self.llm_engine = LLMEngine.from_engine_args( File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 385, in from_engine_args engine_config = engine_args.create_engine_config() File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 670, in create_engine_config parallel_config = ParallelConfig( File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/config.py", line 698, in init self._verify_args() File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/config.py", line 704, in _verify_args raise NotImplementedError("Pipeline parallelism is not supported " NotImplementedError: Pipeline parallelism is not supported yet with multiprocessing.
it is a new feature, try to follow https://docs.vllm.ai/en/latest/getting_started/installation.html to install the latest main, or wait for the next release.
it is a new feature, try to follow https://docs.vllm.ai/en/latest/getting_started/installation.html to install the latest main, or wait for the next release.
Is it possible to set the following parameter: distributed_executor_backend. After I set it up, from vllm import LLM llm = LLM('/data/big_model/gemma-2-27b-it', distributed_executor_backend = "ray",pipeline_parallel_size =3) I got another error: Traceback (most recent call last): File "", line 1, in File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 150, in init self.llm_engine = LLMEngine.from_engine_args( File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 385, in from_engine_args engine_config = engine_args.create_engine_config() File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 670, in create_engine_config parallel_config = ParallelConfig( File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/config.py", line 698, in init self._verify_args() File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/config.py", line 704, in _verify_args raise NotImplementedError("Pipeline parallelism is not supported " NotImplementedError: Pipeline parallelism is not supported yet with multiprocessing.
please give a minimal reproducible example with full log.
oh, one thing to note, pipeline_parallel_size
is not supported in LLM
. you need to use it through openai api server. please carefully read the doc https://docs.vllm.ai/en/latest/serving/distributed_serving.html .
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Your current environment
How would you like to use vllm
I have three 4090 GPUs with a total of 24*3 GB of memory, and the model I need to deploy requires at least 52 GB. The issue is that parallel deployment requires the number of GPUs to be divisible by 32, which is clearly not feasible. Can vllm use a method similar to device_map in transformers to specify how each layer is deployed to solve this problem?