vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.7k stars 4.48k forks source link

[Bug]: TypeError in benchmark_serving.py when using --model parameter #6069

Open Arthur-g-p opened 4 months ago

Arthur-g-p commented 4 months ago

Your current environment

The output of `python collect_env.py`

PyTorch version: 2.3.1+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Pro GCC version: Could not collect Clang version: Could not collect CMake version: Could not collect Libc version: N/A

Python version: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.22621-SP0 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture=9 CurrentClockSpeed=1910 DeviceID=CPU0 Family=198 L2CacheSize=1024 L2CacheSpeed= Manufacturer=GenuineIntel MaxClockSpeed=2112 Name=Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz ProcessorType=3 Revision=

Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] optree==0.11.0 [pip3] torch==2.3.1 [pip3] transformers==4.41.2 [conda] Could not collect ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: N/A vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: Could not collect

🐛 Describe the bug

When running benchmark_serving.py with the --model parameter, I encountered a TypeError related to the snapshot_download() function.

Error message: TypeError: snapshot_download() got an unexpected keyword argument 'model_id'

Root cause: The issue lies in the get_model() function defined in backend_request_func.py. This function calls snapshot_download() from either modelscope or huggingface_hub depending on the VLLM_USE_MODELSCOPE environment variable. Code Snippet:

def get_model(pretrained_model_name_or_path: str):
    if os.getenv('VLLM_USE_MODELSCOPE', 'False').lower() == 'true':
        from modelscope import snapshot_download
    else:
        from huggingface_hub import snapshot_download

    model_path = snapshot_download(
        model_id=pretrained_model_name_or_path,
        local_files_only=huggingface_hub.constants.HF_HUB_OFFLINE,
        ignore_file_pattern=[".*.pt", ".*.safetensors", ".*.bin"])
    return model_path

Problem: The snapshot_download() function from huggingface_hub does not recognize the model_id and ignore_file_pattern parameters. According to the Hugging Face Hub documentation, model_id should be replaced with repo_id, and ignore_file_pattern should be replaced with ignore_patterns.

Using the pattern ".*.safetensors" leads to the download of the safetensors files anyway. It should be updated to "*.safetensors".

Let me know if misunderstood something.

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!