vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.72k stars 3.91k forks source link

Provided example for loading GGUF model is not working [Bug]: #7291

Open sarthakd112 opened 1 month ago

sarthakd112 commented 1 month ago

Your current environment

🐛 Describe the bug

Hi @Isotr0py @mgoin,

I ran the gguf inference example gguf_inference and I was getting NotImplementedError error

image image

Could you please help me for this?

Isotr0py commented 1 month ago

We haven't supported gguf quantization on cpu backend yet. You can try to install vllm with GPU backend.

chintan-ushur commented 1 month ago

@Isotr0py -- I tried with GPU backend, the error persists, please advise:

OSError: It looks like the config file at '/mnt/.cache/huggingface/hub/models--TheBloke--TinyLlama-1.1B-Chat-v1.0-GGUF/snapshots/52e7645ba7c309695bec7ac98f4f005b139cf465/tinyllama-1.1b-chat-v1.0.Q4_0.gguf' is not a valid JSON file.

Isotr0py commented 1 month ago

The released 0.5.4 version hasn't included the gguf supoport. You can build from source code or install the latest nightly wheel:

export VLLM_VERSION=0.5.4 # vLLM's main branch version is currently set to latest released tag
pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-${VLLM_VERSION}-cp38-abi3-manylinux1_x86_64.whl
# You can also access a specific commit
# export VLLM_COMMIT=...
# pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/${VLLM_COMMIT}/vllm-${VLLM_VERSION}-cp38-abi3-manylinux1_x86_64.whl
sergeol commented 1 month ago

The released 0.5.4 version hasn't included the gguf supoport. You can build from source code or install the latest nightly wheel:

export VLLM_VERSION=0.5.4 # vLLM's main branch version is currently set to latest released tag
pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-${VLLM_VERSION}-cp38-abi3-manylinux1_x86_64.whl
# You can also access a specific commit
# export VLLM_COMMIT=...
# pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/${VLLM_COMMIT}/vllm-${VLLM_VERSION}-cp38-abi3-manylinux1_x86_64.whl

This didn't work for me, I still have OSError: It looks like the config file at '/mnt/.cache/huggingface/hub/models--TheBloke--TinyLlama-1.1B-Chat-v1.0-GGUF/snapshots/52e7645ba7c309695bec7ac98f4f005b139cf465/tinyllama-1.1b-chat-v1.0.Q4_0.gguf' is not a valid JSON file.

Mashmoremail commented 3 weeks ago

I'm having the same issue. I tried to install a nightly commit, as suggested by @Isotr0py. But even then,I get the same output as @sergeol no matter which gguf image I try to load in, be it from huggingface hub, or a local path.

I wiped my conda & pip cache just to make sure nothing was get cached and I somehow got on older version, but no. Even on a fresh install of Python, the GGUF functionality seems broken.

chintan-ushur commented 3 weeks ago

The solution provided by @Isotr0py works well for me.