Closed efficentdet closed 23 hours ago
Any body help me to solve this plzzz?
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!
Your current environment
🐛 Describe the bug
When I run the following code:
the code works without any issues and returns the results I expect. However, when I change the
to 1 (since I intend to perform offline batch inference on a single GPU), the code gets stuck at this line and does not continue executing. I then checked the GPU memory usage, which was only 309MB.
Here is the terminal output,and it won't throw any errors and exit; it will just stay here and keep running indefinitely.:
WARNING 08-02 10:42:22 config.py:1425] Casting torch.bfloat16 to torch.float16. INFO 08-02 10:42:22 llm_engine.py:176] Initializing an LLM engine (v0.5.3.post1) with config: model='/ProjectRoot/long_content_LLM/qwen/Qwen2-1_5B-Instruct', speculative_config=None, tokenizer='/ProjectRoot/long_contentLLM/qwen/Qwen2-15B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=/ProjectRoot/long_content_LLM/qwen/Qwen2-1___5B-Instruct, use_v2_block_manager=False, enable_prefix_caching=False) INFO 08-02 10:42:23 selector.py:151] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. INFO 08-02 10:42:23 selector.py:54] Using XFormers backend. [W socket.cpp:464] [c10d] The server socket cannot be initialized on [::]:59759 (errno: 97 - Address family not supported by protocol). [W socket.cpp:697] [c10d] The client socket cannot be initialized to connect to [11-88-234-70.gpu-exporter.prometheus.svc.cluster.local]:59759 (errno: 97 - Address family not supported by protocol).
How can I resolve this? Any help would be greatly appreciated!