vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.84k stars 4.11k forks source link

[Misc]: RuntimeError: Cannot find any model weights [vllm=0.4.0] #4333

Open vishwa27yvs opened 5 months ago

vishwa27yvs commented 5 months ago

Anything you want to discuss about vllm.

I run into the below error when using meta-llama/CodeLlama-7b-Instruct-hf with vllm==0.4.0, torch==2.1.2, the code works perfectly withvllm==0.2.1, but I want to use the most updated version of vllm for some more functionalities. It seems to be a trivial error and I tried several things, reinstall pytorch, transformer, verifying cuda versions etc.

Would be great to get any help!

Error msg: (RayWorkerVllm pid=1842765) ERROR 04-24 09:33:04 ray_utils.py:44] RuntimeError: Cannot find any model weights with 'meta-llama/CodeLlama-7b-Instruct-hf'

vishwa27yvs commented 5 months ago

Follow up:

the code seems to work when I commented the load_format, this defaults to Using model weights format ['*.safetensors'], is this expected?

model = LLM(
        model=model_name_or_path,
        tensor_parallel_size=num_gpus,
        trust_remote_code=True,
        download_dir=HUGGINGFACE_CACHE,
        # load_format="pt",
        max_num_batched_tokens = max_model_len,
        max_model_len = max_model_len,
        swap_space = 1
        # max_num_batched_tokens = 8192 # max_prompt_length
    )