Open vishwa27yvs opened 5 months ago
Follow up:
the code seems to work when I commented the load_format, this defaults to Using model weights format ['*.safetensors']
, is this expected?
model = LLM(
model=model_name_or_path,
tensor_parallel_size=num_gpus,
trust_remote_code=True,
download_dir=HUGGINGFACE_CACHE,
# load_format="pt",
max_num_batched_tokens = max_model_len,
max_model_len = max_model_len,
swap_space = 1
# max_num_batched_tokens = 8192 # max_prompt_length
)
Anything you want to discuss about vllm.
I run into the below error when using meta-llama/CodeLlama-7b-Instruct-hf with
vllm==0.4.0, torch==2.1.2
, the code works perfectly withvllm==0.2.1
, but I want to use the most updated version of vllm for some more functionalities. It seems to be a trivial error and I tried several things, reinstall pytorch, transformer, verifying cuda versions etc.Would be great to get any help!
Error msg:
(RayWorkerVllm pid=1842765) ERROR 04-24 09:33:04 ray_utils.py:44] RuntimeError: Cannot find any model weights with 'meta-llama/CodeLlama-7b-Instruct-hf'