vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
31.17k stars 4.73k forks source link

[Bug]: Qwen-QwQ-32B-Preview vllm 0.6.4-post1 huggingface_hub.errors.HFValidationError: #10833

Closed xiezhipeng-git closed 7 hours ago

xiezhipeng-git commented 8 hours ago

Your current environment

The output of `python collect_env.py` ```text aimo2/collect_env.py Collecting environment information... PyTorch version: 2.5.1+cu124 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM used to build PyTorch: N/A OS: Ubuntu 24.04.1 LTS (x86_64) GCC version: (Ubuntu 13.2.0-23ubuntu4) 13.2.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.39 Python version: 3.12.7 | packaged by Anaconda, Inc. | (main, Oct 4 2024, 13:27:36) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.39 Is CUDA available: True CUDA runtime version: 12.6.77 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4090 Nvidia driver version: 560.94 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.9.5.1 /usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.5.1 /usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.5.1 /usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.5.1 /usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.5.1 /usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.5.1 /usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.5.1 /usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.5.1 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Vendor ID: GenuineIntel Model name: 13th Gen Intel(R) Core(TM) i9-13900KS CPU family: 6 Model: 183 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 1 Stepping: 1 BogoMIPS: 6374.40 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities Virtualization: VT-x Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 768 KiB (16 instances) L1i cache: 512 KiB (16 instances) L2 cache: 32 MiB (16 instances) L3 cache: 36 MiB (1 instance) Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Mitigation; Enhanced IBRS Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] flake8==7.0.0 [pip3] mypy==1.11.2 [pip3] mypy-extensions==1.0.0 [pip3] numpy==1.26.4 [pip3] numpydoc==1.7.0 [pip3] nvidia-cublas-cu12==12.4.5.8 [pip3] nvidia-cuda-cupti-cu12==12.4.127 [pip3] nvidia-cuda-nvrtc-cu12==12.4.127 [pip3] nvidia-cuda-runtime-cu12==12.4.127 [pip3] nvidia-cudnn-cu12==9.1.0.70 [pip3] nvidia-cufft-cu12==11.2.1.3 [pip3] nvidia-curand-cu12==10.3.5.147 [pip3] nvidia-cusolver-cu12==11.6.1.9 [pip3] nvidia-cusparse-cu12==12.3.1.170 [pip3] nvidia-ml-py==12.560.30 [pip3] nvidia-nccl-cu12==2.21.5 [pip3] nvidia-nvjitlink-cu12==12.4.127 [pip3] nvidia-nvtx-cu12==12.4.127 [pip3] pyzmq==25.1.2 [pip3] torch==2.5.1 [pip3] torchaudio==2.4.0 [pip3] torchvision==0.20.1 [pip3] transformers==4.46.1 [pip3] triton==3.1.0 [conda] _anaconda_depends 2024.10 py312_mkl_0 [conda] blas 1.0 mkl [conda] cuda-cudart 12.4.127 0 nvidia [conda] cuda-cupti 12.4.127 0 nvidia [conda] cuda-libraries 12.4.1 0 nvidia [conda] cuda-nvrtc 12.4.127 0 nvidia [conda] cuda-nvtx 12.4.127 0 nvidia [conda] cuda-opencl 12.6.77 0 nvidia [conda] cuda-runtime 12.4.1 0 nvidia [conda] cuda-version 12.6 3 nvidia [conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] libcublas 12.4.5.8 0 nvidia [conda] libcufft 11.2.1.3 0 nvidia [conda] libcufile 1.11.1.6 0 nvidia [conda] libcurand 10.3.7.77 0 nvidia [conda] libcusolver 11.6.1.9 0 nvidia [conda] libcusparse 12.3.1.170 0 nvidia [conda] libjpeg-turbo 2.0.0 h9bf148f_0 pytorch [conda] libnpp 12.2.5.30 0 nvidia [conda] libnvfatbin 12.6.77 0 nvidia [conda] libnvjitlink 12.4.127 0 nvidia [conda] libnvjpeg 12.3.1.117 0 nvidia [conda] mkl 2023.1.0 h213fc3f_46344 [conda] mkl-service 2.4.0 py312h5eee18b_1 [conda] mkl_fft 1.3.10 py312h5eee18b_0 [conda] mkl_random 1.2.7 py312h526ad5a_0 [conda] numpy 1.26.4 py312hc5e2394_0 [conda] numpy-base 1.26.4 py312h0da6c21_0 [conda] numpydoc 1.7.0 py312h06a4308_0 [conda] nvidia-cublas-cu12 12.4.5.8 pypi_0 pypi [conda] nvidia-cuda-cupti-cu12 12.4.127 pypi_0 pypi [conda] nvidia-cuda-nvrtc-cu12 12.4.127 pypi_0 pypi [conda] nvidia-cuda-runtime-cu12 12.4.127 pypi_0 pypi [conda] nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi [conda] nvidia-cufft-cu12 11.2.1.3 pypi_0 pypi [conda] nvidia-curand-cu12 10.3.5.147 pypi_0 pypi [conda] nvidia-cusolver-cu12 11.6.1.9 pypi_0 pypi [conda] nvidia-cusparse-cu12 12.3.1.170 pypi_0 pypi [conda] nvidia-ml-py 12.560.30 pypi_0 pypi [conda] nvidia-nccl-cu12 2.21.5 pypi_0 pypi [conda] nvidia-nvjitlink-cu12 12.4.127 pypi_0 pypi [conda] nvidia-nvtx-cu12 12.4.127 pypi_0 pypi [conda] pytorch-cuda 12.4 hc786d27_7 pytorch [conda] pytorch-mutex 1.0 cuda pytorch [conda] pyzmq 25.1.2 py312h6a678d5_0 [conda] torch 2.5.1 pypi_0 pypi [conda] torchaudio 2.4.0 py312_cu124 pytorch [conda] torchvision 0.20.1 pypi_0 pypi [conda] transformers 4.46.1 pypi_0 pypi [conda] triton 3.0.0 pypi_0 pypi ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.6.4.post1 vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X N/A Legend: X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks LD_LIBRARY_PATH=/root/anaconda3/lib/python3.12/site-packages/cv2/../../lib64:/usr/local/cuda-12.6/lib64:/usr/local/cuda-12.6/lib64 MKL_THREADING_LAYER=GNU MKL_SERVICE_FORCE_INTEL=1 CUDA_MODULE_LOADING=LAZY ```

Model Input Dumps

INFO 12-02 23:29:11 api_server.py:585] vLLM API server version 0.6.4.post1
INFO 12-02 23:29:11 api_server.py:586] args: Namespace(host='0.0.0.0', port=45001, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='D:\\Users\\Admin\\.cache\\modelscope\\hub\\qwen\\Qwen-QwQ-32B-Preview-GGUF\\QwQ-32B-Preview-IQ2_XS.gguf', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', chat_template_text_format='string', trust_remote_code=True, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=8192, guided_decoding_backend='outlines', distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, disable_sliding_window=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.96, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=12, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=True, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=['1'], qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False)
INFO 12-02 23:29:11 api_server.py:175] Multiprocessing frontend to use ipc:///tmp/8fa5abb3-77cb-480d-93b0-7606d6c535ea for IPC Path.
INFO 12-02 23:29:11 api_server.py:194] Started engine process with PID 47559
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/root/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 643, in <module>
uvloop.run(run_server(args))
File "/root/anaconda3/lib/python3.12/site-packages/uvloop/__init__.py", line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/root/anaconda3/lib/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 609, in run_server
async with build_async_engine_client(args) as engine_client:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 113, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 197, in build_async_engine_client_from_engine_args
engine_config = engine_args.create_engine_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 959, in create_engine_config
model_config = self.create_model_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 891, in create_model_config
return ModelConfig(
^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/vllm/config.py", line 208, in __init__
hf_config = get_config(self.model, trust_remote_code, revision,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 161, in get_config
if is_gguf or file_or_path_exists(
^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 81, in file_or_path_exists
cached_filepath = try_to_load_from_cache(repo_id=model,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn
validate_repo_id(arg_value)
File "/root/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 160, in validate_repo_id
raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'D:\Users\Admin\.cache\modelscope\hub\qwen\Qwen-QwQ-32B-Preview-GGUF\QwQ-32B-Preview-IQ2_XS.gguf'.
ERROR 12-02 23:29:13 engine.py:366] Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'D:\Users\Admin\.cache\modelscope\hub\qwen\Qwen-QwQ-32B-Preview-GGUF\QwQ-32B-Preview-IQ2_XS.gguf'.
ERROR 12-02 23:29:13 engine.py:366] Traceback (most recent call last):
ERROR 12-02 23:29:13 engine.py:366]   File "/root/anaconda3/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
ERROR 12-02 23:29:13 engine.py:366]     engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 12-02 23:29:13 engine.py:366]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-02 23:29:13 engine.py:366]   File "/root/anaconda3/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 114, in from_engine_args
ERROR 12-02 23:29:13 engine.py:366]     engine_config = engine_args.create_engine_config()
ERROR 12-02 23:29:13 engine.py:366]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-02 23:29:13 engine.py:366]   File "/root/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 959, in create_engine_config
ERROR 12-02 23:29:13 engine.py:366]     model_config = self.create_model_config()
ERROR 12-02 23:29:13 engine.py:366]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-02 23:29:13 engine.py:366]   File "/root/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 891, in create_model_config
ERROR 12-02 23:29:13 engine.py:366]     return ModelConfig(
ERROR 12-02 23:29:13 engine.py:366]            ^^^^^^^^^^^^
ERROR 12-02 23:29:13 engine.py:366]   File "/root/anaconda3/lib/python3.12/site-packages/vllm/config.py", line 208, in __init__
ERROR 12-02 23:29:13 engine.py:366]     hf_config = get_config(self.model, trust_remote_code, revision,
ERROR 12-02 23:29:13 engine.py:366]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-02 23:29:13 engine.py:366]   File "/root/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 161, in get_config
ERROR 12-02 23:29:13 engine.py:366]     if is_gguf or file_or_path_exists(
ERROR 12-02 23:29:13 engine.py:366]                   ^^^^^^^^^^^^^^^^^^^^
ERROR 12-02 23:29:13 engine.py:366]   File "/root/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 81, in file_or_path_exists
ERROR 12-02 23:29:13 engine.py:366]     cached_filepath = try_to_load_from_cache(repo_id=model,
ERROR 12-02 23:29:13 engine.py:366]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-02 23:29:13 engine.py:366]   File "/root/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn
ERROR 12-02 23:29:13 engine.py:366]     validate_repo_id(arg_value)
ERROR 12-02 23:29:13 engine.py:366]   File "/root/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 160, in validate_repo_id
ERROR 12-02 23:29:13 engine.py:366]     raise HFValidationError(
ERROR 12-02 23:29:13 engine.py:366] huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'D:\Users\Admin\.cache\modelscope\hub\qwen\Qwen-QwQ-32B-Preview-GGUF\QwQ-32B-Preview-IQ2_XS.gguf'.
Process SpawnProcess-1:
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/root/anaconda3/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/root/anaconda3/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 368, in run_mp_engine
raise e
File "/root/anaconda3/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 114, in from_engine_args
engine_config = engine_args.create_engine_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 959, in create_engine_config
model_config = self.create_model_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 891, in create_model_config
return ModelConfig(
^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/vllm/config.py", line 208, in __init__
hf_config = get_config(self.model, trust_remote_code, revision,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 161, in get_config
if is_gguf or file_or_path_exists(
^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 81, in file_or_path_exists
cached_filepath = try_to_load_from_cache(repo_id=model,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn
validate_repo_id(arg_value)
File "/root/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 160, in validate_repo_id
raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'D:\Users\Admin\.cache\modelscope\hub\qwen\Qwen-QwQ-32B-Preview-GGUF\QwQ-32B-Preview-IQ2_XS.gguf'.
HFValidationError                         Traceback (most recent call last)
Cell In[4], [line 16](vscode-notebook-cell:?execution_count=4&line=16)
     [11](vscode-notebook-cell:?execution_count=4&line=11)     torch.cuda.empty_cache()
     [14](vscode-notebook-cell:?execution_count=4&line=14) llm_model_pth = MODLE_PATH
---> [16](vscode-notebook-cell:?execution_count=4&line=16) llm = LLM(
     [17](vscode-notebook-cell:?execution_count=4&line=17)     llm_model_pth,
     [18](vscode-notebook-cell:?execution_count=4&line=18)     dtype="half",                # The data type for the model weights and activations
     [19](vscode-notebook-cell:?execution_count=4&line=19)     max_num_seqs=12,              # Maximum number of sequences per iteration. Default is 256
     [20](vscode-notebook-cell:?execution_count=4&line=20)     max_model_len=8192,          # Model context length
     [21](vscode-notebook-cell:?execution_count=4&line=21)     trust_remote_code=True,      # Trust remote code (e.g., from HuggingFace) when downloading the model and tokenizer
     [22](vscode-notebook-cell:?execution_count=4&line=22)     tensor_parallel_size=PARALLERL_NUM,      # The number of GPUs to use for distributed execution with tensor parallelism
     [23](vscode-notebook-cell:?execution_count=4&line=23)     gpu_memory_utilization=0.97, # The ratio (between 0 and 1) of GPU memory to reserve for the model
     [24](vscode-notebook-cell:?execution_count=4&line=24) )

File ~/anaconda3/lib/python3.12/site-packages/vllm/utils.py:1028, in deprecate_args.<locals>.wrapper.<locals>.inner(*args, **kwargs)
   [1021](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/utils.py:1021)             msg += f" {additional_message}"
   [1023](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/utils.py:1023)         warnings.warn(
   [1024](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/utils.py:1024)             DeprecationWarning(msg),
   [1025](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/utils.py:1025)             stacklevel=3,  # The inner function takes up one level
   [1026](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/utils.py:1026)         )
-> [1028](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/utils.py:1028) return fn(*args, **kwargs)

File ~/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/llm.py:210, in LLM.__init__(self, model, tokenizer, tokenizer_mode, skip_tokenizer_init, trust_remote_code, allowed_local_media_path, tensor_parallel_size, dtype, quantization, revision, tokenizer_revision, seed, gpu_memory_utilization, swap_space, cpu_offload_gb, enforce_eager, max_seq_len_to_capture, disable_custom_all_reduce, disable_async_output_proc, hf_overrides, mm_processor_kwargs, task, override_pooler_config, **kwargs)
    [207](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/llm.py:207) self.engine_class = self.get_engine_class()
    [209](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/llm.py:209) # TODO(rob): enable mp by default (issue with fork vs spawn)
--> [210](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/llm.py:210) self.llm_engine = self.engine_class.from_engine_args(
    [211](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/llm.py:211)     engine_args, usage_context=UsageContext.LLM_CLASS)
    [213](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/llm.py:213) self.request_counter = Counter()

File ~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:582, in LLMEngine.from_engine_args(cls, engine_args, usage_context, stat_loggers)
    [580](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:580) """Creates an LLM engine from the engine arguments."""
    [581](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:581) # Create the engine configs.
--> [582](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:582) engine_config = engine_args.create_engine_config()
    [583](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:583) executor_class = cls._get_executor_cls(engine_config)
    [584](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:584) # Create the LLM engine.

File ~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:959, in EngineArgs.create_engine_config(self)
    [954](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:954) assert self.cpu_offload_gb >= 0, (
    [955](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:955)     "CPU offload space must be non-negative"
    [956](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:956)     f", but got {self.cpu_offload_gb}")
    [958](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:958) device_config = DeviceConfig(device=self.device)
--> [959](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:959) model_config = self.create_model_config()
    [961](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:961) if model_config.is_multimodal_model:
    [962](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:962)     if self.enable_prefix_caching:

File ~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:891, in EngineArgs.create_model_config(self)
    [890](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:890) def create_model_config(self) -> ModelConfig:
--> [891](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:891)     return ModelConfig(
    [892](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:892)         model=self.model,
    [893](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:893)         task=self.task,
    [894](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:894)         # We know this is not None because we set it in __post_init__
    [895](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:895)         tokenizer=cast(str, self.tokenizer),
    [896](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:896)         tokenizer_mode=self.tokenizer_mode,
    [897](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:897)         chat_template_text_format=self.chat_template_text_format,
    [898](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:898)         trust_remote_code=self.trust_remote_code,
    [899](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:899)         allowed_local_media_path=self.allowed_local_media_path,
    [900](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:900)         dtype=self.dtype,
    [901](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:901)         seed=self.seed,
    [902](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:902)         revision=self.revision,
    [903](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:903)         code_revision=self.code_revision,
    [904](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:904)         rope_scaling=self.rope_scaling,
    [905](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:905)         rope_theta=self.rope_theta,
    [906](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:906)         hf_overrides=self.hf_overrides,
    [907](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:907)         tokenizer_revision=self.tokenizer_revision,
    [908](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:908)         max_model_len=self.max_model_len,
    [909](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:909)         quantization=self.quantization,
    [910](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:910)         quantization_param_path=self.quantization_param_path,
    [911](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:911)         enforce_eager=self.enforce_eager,
    [912](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:912)         max_seq_len_to_capture=self.max_seq_len_to_capture,
    [913](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:913)         max_logprobs=self.max_logprobs,
    [914](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:914)         disable_sliding_window=self.disable_sliding_window,
    [915](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:915)         skip_tokenizer_init=self.skip_tokenizer_init,
    [916](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:916)         served_model_name=self.served_model_name,
    [917](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:917)         limit_mm_per_prompt=self.limit_mm_per_prompt,
    [918](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:918)         use_async_output_proc=not self.disable_async_output_proc,
    [919](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:919)         config_format=self.config_format,
    [920](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:920)         mm_processor_kwargs=self.mm_processor_kwargs,
    [921](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:921)         override_neuron_config=self.override_neuron_config,
    [922](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:922)         override_pooler_config=self.override_pooler_config,
    [923](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/engine/arg_utils.py:923)     )

File ~/anaconda3/lib/python3.12/site-packages/vllm/config.py:208, in ModelConfig.__init__(self, model, task, tokenizer, tokenizer_mode, trust_remote_code, dtype, seed, allowed_local_media_path, revision, code_revision, rope_scaling, rope_theta, tokenizer_revision, max_model_len, spec_target_max_model_len, quantization, quantization_param_path, enforce_eager, max_seq_len_to_capture, max_logprobs, disable_sliding_window, skip_tokenizer_init, served_model_name, limit_mm_per_prompt, use_async_output_proc, config_format, chat_template_text_format, hf_overrides, mm_processor_kwargs, override_neuron_config, override_pooler_config)
    [205](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/config.py:205) self.disable_sliding_window = disable_sliding_window
    [206](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/config.py:206) self.skip_tokenizer_init = skip_tokenizer_init
--> [208](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/config.py:208) hf_config = get_config(self.model, trust_remote_code, revision,
    [209](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/config.py:209)                        code_revision, config_format, **hf_overrides_kw)
    [210](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/config.py:210) hf_config = hf_overrides_fn(hf_config)
    [211](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/config.py:211) self.hf_config = hf_config

File ~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:161, in get_config(model, trust_remote_code, revision, code_revision, config_format, token, **kwargs)
    [158](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:158)     model = Path(model).parent
    [160](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:160) if config_format == ConfigFormat.AUTO:
--> [161](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:161)     if is_gguf or file_or_path_exists(
    [162](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:162)             model, HF_CONFIG_NAME, revision=revision, token=token):
    [163](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:163)         config_format = ConfigFormat.HF
    [164](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:164)     elif file_or_path_exists(model,
    [165](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:165)                              MISTRAL_CONFIG_NAME,
    [166](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:166)                              revision=revision,
    [167](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:167)                              token=token):

File ~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:81, in file_or_path_exists(model, config_name, revision, token)
     [78](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:78)     return (Path(model) / config_name).is_file()
     [80](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:80) # Offline mode support: Check if config file is cached already
---> [81](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:81) cached_filepath = try_to_load_from_cache(repo_id=model,
     [82](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:82)                                          filename=config_name,
     [83](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:83)                                          revision=revision)
     [84](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:84) if isinstance(cached_filepath, str):
     [85](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:85)     # The config file exists in cache- we can continue trying to load
     [86](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/vllm/transformers_utils/config.py:86)     return True

File ~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:106, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs)
    [101](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:101) for arg_name, arg_value in chain(
    [102](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:102)     zip(signature.parameters, args),  # Args values
    [103](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:103)     kwargs.items(),  # Kwargs values
    [104](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:104) ):
    [105](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:105)     if arg_name in ["repo_id", "from_id", "to_id"]:
--> [106](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:106)         validate_repo_id(arg_value)
    [108](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:108)     elif arg_name == "token" and arg_value is not None:
    [109](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:109)         has_token = True

File ~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:160, in validate_repo_id(repo_id)
    [154](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:154)     raise HFValidationError(
    [155](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:155)         "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
    [156](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:156)         f" '{repo_id}'. Use `repo_type` argument if needed."
    [157](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:157)     )
    [159](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:159) if not REPO_ID_REGEX.match(repo_id):
--> [160](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:160)     raise HFValidationError(
    [161](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:161)         "Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are"
    [162](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:162)         " forbidden, '-' and '.' cannot start or end the name, max length is 96:"
    [163](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:163)         f" '{repo_id}'."
    [164](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:164)     )
    [166](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:166) if "--" in repo_id or ".." in repo_id:
    [167](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/aimo2/~/anaconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:167)     raise HFValidationError(f"Cannot have -- or .. in repo_id: '{repo_id}'.")

HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'D:\Users\Admin\.cache\modelscope\hub\qwen\Qwen-QwQ-32B-Preview-GGUF\QwQ-32B-Preview-IQ2_XS.gguf'.

🐛 Describe the bug

python -m vllm.entrypoints.openai.api_server     --model="D:\Users\Admin\.cache\modelscope\hub\qwen\Qwen-QwQ-32B-Preview-GGUF\QwQ-32B-Preview-IQ2_XS.gguf"     --served-model-name="1"    --trust-remote-code     --host=0.0.0.0     --port=45001     --tensor-parallel-size=1     --gpu-memory-utilization=0.96     --max-num-seqs=12     --enforce-eager     --max-model-len=8192

use Qwen-QwQ-32B-Preview vllm 0.6.4-post1 openai api and gguf kaggle awq model api or local all has this issue. And full size other error https://www.modelscope.cn/models/Qwen/QwQ-32B-Preview/feedback/issueDetail/18860 In kaggle not 0.6.3.post1 awq is run success

Before submitting a new issue...

xiezhipeng-git commented 8 hours ago

@DarkLight1337

DarkLight1337 commented 8 hours ago

We don't officially support Windows. Can you run this with a Linux file path?

Isotr0py commented 8 hours ago

model='D:\Users\Admin\.cache\modelscope\hub\qwen\Qwen-QwQ-32B-Preview-GGUF\QwQ-32B-Preview-IQ2_XS.gguf'

If you are running vLLM on WSL, to access the model on your windows file system, you should use /mnt/d/..., so I guess the model path should be '/mnt/d/Users/Admin/.cache/modelscope/hub/qwen/Qwen-QwQ-32B-Preview-GGUF/QwQ-32B-Preview-IQ2_XS.gguf'

xiezhipeng-git commented 7 hours ago

model='D:\Users\Admin.cache\modelscope\hub\qwen\Qwen-QwQ-32B-Preview-GGUF\QwQ-32B-Preview-IQ2_XS.gguf'model='D:\Users\Admin.cache\modelscope\hub\qwen\Qwen-QwQ-32B-Preview-GGUF\QwQ-32B-Preview-IQ2_XS.gguf'

If you are running vLLM on WSL, to access the model on your windows file system, you should use /mnt/d/..., so I guess the model path should be '/mnt/d/Users/Admin/.cache/modelscope/hub/qwen/Qwen-QwQ-32B-Preview-GGUF/QwQ-32B-Preview-IQ2_XS.gguf'如果你在WSL上运行vLLM,要访问你的Windows文件系统上的模型,你应该使用/mnt/d/.,所以我猜模型路径应该是 '/mnt/d/Users/Admin/.cache/modelscope/hub/qwen/Qwen-QwQ-32B-Preview-GGUF/QwQ-32B-Preview-IQ2_XS.gguf'

you are right. I forget that. I‘m sorry.