2023-12-19 07:10:25,057 INFO worker.py:1673 -- Started a local Ray instance.
INFO 12-19 07:10:27 llm_engine.py:73] Initializing an LLM engine with config: model='/Models/Qwen-72B-Chat', tokenizer='/Models/Qwen-72B-Chat', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=2, quantization=None, enforce_eager=False, seed=0)
WARNING 12-19 07:10:28 tokenizer.py:62] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
问题描述
使用第2第3块gpu启动时,卡着不动(而使用1 2、1 3的两两组合则没有问题)
cuda版本:12.1.0 Driver版本: 535.54.03 torch: 2.1.2 fschat: 0.2.34 vllm: 0.2.6 ray: 2.8.1
启动命令
日志信息
nvidia-smi