vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.59k stars 3.9k forks source link

Please help me solve the problem. thanks #1784

Closed CP3666 closed 3 months ago

CP3666 commented 9 months ago

(songdh) [root@localhost server_llm]# python -m vllm.entrypoints.api_server --model $model_path --tokenizer $model_path --tensor-parallel-size $GPUS --dtype auto --port $port --host 0.0.0.0 --gpu-memory-utilization 0.9 --quantization awq --dtype float16 --load-format auto & [1] 86402 (songdh) [root@localhost server_llm]# WARNING 11-25 10:47:23 config.py:398] Casting torch.bfloat16 to torch.float16. WARNING 11-25 10:47:23 config.py:140] awq quantization is not fully optimized yet. The speed can be slower than non-quantized models. INFO 11-25 10:47:23 llm_engine.py:72] Initializing an LLM engine with config: model='/data5/llama/models_hf/13B', tokenizer='/data5/llama/models_hf/13B', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=awq, seed=0) INFO 11-25 10:47:23 tokenizer.py:31] For some LLaMA V1 models, initializing the fast tokenizer may take a long time. To reduce the initialization time, consider using 'hf-internal-testing/llama-tokenizer' instead of the original tokenizer. Traceback (most recent call last): File "/root/anaconda3/envs/songdh/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/anaconda3/envs/songdh/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/root/anaconda3/envs/songdh/lib/python3.10/site-packages/vllm/entrypoints/api_server.py", line 80, in engine = AsyncLLMEngine.from_engine_args(engine_args) File "/root/anaconda3/envs/songdh/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 486, in from_engine_args engine = cls(parallel_config.worker_use_ray, File "/root/anaconda3/envs/songdh/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 269, in init self.engine = self._init_engine(*args, kwargs) File "/root/anaconda3/envs/songdh/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 305, in _init_engine return engine_class(*args, *kwargs) File "/root/anaconda3/envs/songdh/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 110, in init self._init_workers(distributed_init_method) File "/root/anaconda3/envs/songdh/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 142, in _init_workers self._run_workers( File "/root/anaconda3/envs/songdh/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 700, in _run_workers output = executor(args, kwargs) File "/root/anaconda3/envs/songdh/lib/python3.10/site-packages/vllm/worker/worker.py", line 70, in init_model self.model = get_model(self.model_config) File "/root/anaconda3/envs/songdh/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 67, in get_model quant_config = get_quant_config(model_config.quantization, File "/root/anaconda3/envs/songdh/lib/python3.10/site-packages/vllm/model_executor/weight_utils.py", line 114, in get_quant_config raise ValueError(f"Cannot find the config file for {quantization}") ValueError: Cannot find the config file for awq

777ki commented 9 months ago

try --tokenizer-mode=slow?

kasoushu commented 9 months ago

same

ghost commented 9 months ago

You'd have to specify what model you are trying to load. Maybe the repo doesn't contain the quant_config.json file?

acodercat commented 7 months ago

--max-model-len 2048

hmellor commented 3 months ago

vLLM does not quantize models for you. If the model you are trying to load isn't quantised then it won't work. This appears to be what is happening.