Open efficentdet opened 1 month ago
最后报了超时错误,该怎么解决?真的很急 $sudo CUDA_VISIBLE_DEVICES=0 PYTHONPATH="/GlobalData/rijian.lrj/miniconda3/envs/vllm_shc/lib/python3.8/site-packages/" python vllm_test.py WARNING 08-05 11:02:58 config.py:1425] Casting torch.bfloat16 to torch.float16. INFO 08-05 11:02:58 llm_engine.py:176] Initializing an LLM engine (v0.5.3.post1) with config: model='/ProjectRoot/long_content_LLM/qwen/Qwen2-1_5B-Instruct', speculative_config=None, tokenizer='/ProjectRoot/long_contentLLM/qwen/Qwen2-15B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=/ProjectRoot/long_content_LLM/qwen/Qwen2-1___5B-Instruct, use_v2_block_manager=False, enable_prefix_caching=False) INFO 08-05 11:02:59 selector.py:151] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. INFO 08-05 11:02:59 selector.py:54] Using XFormers backend. [W socket.cpp:464] [c10d] The server socket cannot be initialized on [::]:36893 (errno: 97 - Address family not supported by protocol). [W socket.cpp:697] [c10d] The client socket cannot be initialized to connect to [11-88-234-70.gpu-exporter.prometheus.svc.cluster.local]:36893 (errno: 97 - Address family not supported by protocol). [E socket.cpp:957] [c10d] The client socket has timed out after 600s while trying to connect to (11.88.234.70, 36893). Traceback (most recent call last): File "vllm_test.py", line 32, in llm = LLM(model=path, trust_remote_code=True, tensor_parallel_size=1, dtype=torch.float16) File "/GlobalData/rijian.lrj/miniconda3/envs/vllm_shc/lib/python3.8/site-packages/vllm/entrypoints/llm.py", line 155, in init self.llm_engine = LLMEngine.from_engine_args( File "/GlobalData/rijian.lrj/miniconda3/envs/vllm_shc/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 441, in from_engine_args engine = cls( File "/GlobalData/rijian.lrj/miniconda3/envs/vllm_shc/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 251, in init self.model_executor = executor_class( File "/GlobalData/rijian.lrj/miniconda3/envs/vllm_shc/lib/python3.8/site-packages/vllm/executor/executor_base.py", line 47, in init self._init_executor() File "/GlobalData/rijian.lrj/miniconda3/envs/vllm_shc/lib/python3.8/site-packages/vllm/executor/gpu_executor.py", line 35, in _init_executor self.driver_worker.init_device() File "/GlobalData/rijian.lrj/miniconda3/envs/vllm_shc/lib/python3.8/site-packages/vllm/worker/worker.py", line 132, in init_device init_worker_distributed_environment(self.parallel_config, self.rank, File "/GlobalData/rijian.lrj/miniconda3/envs/vllm_shc/lib/python3.8/site-packages/vllm/worker/worker.py", line 343, in init_worker_distributed_environment init_distributed_environment(parallel_config.world_size, rank, File "/GlobalData/rijian.lrj/miniconda3/envs/vllm_shc/lib/python3.8/site-packages/vllm/distributed/parallel_state.py", line 812, in init_distributed_environment torch.distributed.init_process_group( File "/GlobalData/rijian.lrj/miniconda3/envs/vllm_shc/lib/python3.8/site-packages/torch/distributed/c10d_logger.py", line 75, in wrapper return func(*args, **kwargs) File "/GlobalData/rijian.lrj/miniconda3/envs/vllm_shc/lib/python3.8/site-packages/torch/distributed/c10d_logger.py", line 89, in wrapper func_return = func(*args, **kwargs) File "/GlobalData/rijian.lrj/miniconda3/envs/vllm_shc/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1305, in init_process_group store, rank, world_size = next(rendezvous_iterator) File "/GlobalData/rijian.lrj/miniconda3/envs/vllm_shc/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 199, in _tcp_rendezvous_handler store = _create_c10d_store(result.hostname, result.port, rank, world_size, timeout, use_libuv) File "/GlobalData/rijian.lrj/miniconda3/envs/vllm_shc/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 174, in _create_c10d_store return TCPStore( torch.distributed.DistNetworkError: The client socket has timed out after 600s while trying to connect to (11.88.234.70, 36893).
It seems this is an address problem. You started server on IPv6 but failed(not shown started on IPv4) [W socket.cpp:464] [c10d] The server socket cannot be initialized on [::]:36893 (errno: 97 - Address family not supported by protocol). But you connect the port through IPv4 [E socket.cpp:957] [c10d] The client socket has timed out after 600s while trying to connect to (11.88.234.70, 36893).
You can try to change the IP by modifying environment variable VLLM_HOST_IP
@DC-Shi hello, i change the VLLM_HOST_IP to for example 0.0.0.0, but still fails. may i ask how do you change the VLLM_HOST_IP?
Your current environment
问题
🐛 Describe the bug
上面这段代码执行没问题,但是当我把tensor_parallel_size从2改成1希望在单卡上面部署离线推理,执行到
这一步只会报如下显示,然后就会一直没有反应,也不保错: $sudo CUDA_VISIBLE_DEVICES=0 PYTHONPATH="/GlobalData/rijian.lrj/miniconda3/envs/vllm_shc/lib/python3.8/site-packages/" python vllm_test.py
WARNING 08-05 11:02:58 config.py:1425] Casting torch.bfloat16 to torch.float16. INFO 08-05 11:02:58 llm_engine.py:176] Initializing an LLM engine (v0.5.3.post1) with config: model='/ProjectRoot/long_content_LLM/qwen/Qwen2-1_5B-Instruct', speculative_config=None, tokenizer='/ProjectRoot/long_contentLLM/qwen/Qwen2-15B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=/ProjectRoot/long_content_LLM/qwen/Qwen2-1___5B-Instruct, use_v2_block_manager=False, enable_prefix_caching=False) INFO 08-05 11:02:59 selector.py:151] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. INFO 08-05 11:02:59 selector.py:54] Using XFormers backend. [W socket.cpp:464] [c10d] The server socket cannot be initialized on [::]:36893 (errno: 97 - Address family not supported by protocol). [W socket.cpp:697] [c10d] The client socket cannot be initialized to connect to [11-88-234-70.gpu-exporter.prometheus.svc.cluster.local]:36893 (errno: 97 - Address family not supported by protocol).
上面粗体是最后显示,然后就也不会报错一直这样,我该怎么解决,求求