Open sarthakd112 opened 1 month ago
We haven't supported gguf
quantization on cpu backend yet. You can try to install vllm with GPU backend.
@Isotr0py -- I tried with GPU backend, the error persists, please advise:
OSError: It looks like the config file at '/mnt/.cache/huggingface/hub/models--TheBloke--TinyLlama-1.1B-Chat-v1.0-GGUF/snapshots/52e7645ba7c309695bec7ac98f4f005b139cf465/tinyllama-1.1b-chat-v1.0.Q4_0.gguf' is not a valid JSON file.
The released 0.5.4 version hasn't included the gguf supoport. You can build from source code or install the latest nightly wheel:
export VLLM_VERSION=0.5.4 # vLLM's main branch version is currently set to latest released tag
pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-${VLLM_VERSION}-cp38-abi3-manylinux1_x86_64.whl
# You can also access a specific commit
# export VLLM_COMMIT=...
# pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/${VLLM_COMMIT}/vllm-${VLLM_VERSION}-cp38-abi3-manylinux1_x86_64.whl
The released 0.5.4 version hasn't included the gguf supoport. You can build from source code or install the latest nightly wheel:
export VLLM_VERSION=0.5.4 # vLLM's main branch version is currently set to latest released tag pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-${VLLM_VERSION}-cp38-abi3-manylinux1_x86_64.whl # You can also access a specific commit # export VLLM_COMMIT=... # pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/${VLLM_COMMIT}/vllm-${VLLM_VERSION}-cp38-abi3-manylinux1_x86_64.whl
This didn't work for me, I still have
OSError: It looks like the config file at '/mnt/.cache/huggingface/hub/models--TheBloke--TinyLlama-1.1B-Chat-v1.0-GGUF/snapshots/52e7645ba7c309695bec7ac98f4f005b139cf465/tinyllama-1.1b-chat-v1.0.Q4_0.gguf' is not a valid JSON file.
I'm having the same issue. I tried to install a nightly commit, as suggested by @Isotr0py. But even then,I get the same output as @sergeol no matter which gguf image I try to load in, be it from huggingface hub, or a local path.
I wiped my conda & pip cache just to make sure nothing was get cached and I somehow got on older version, but no. Even on a fresh install of Python, the GGUF functionality seems broken.
The solution provided by @Isotr0py works well for me.
Your current environment
🐛 Describe the bug
Hi @Isotr0py @mgoin,
I ran the gguf inference example gguf_inference and I was getting NotImplementedError error
Could you please help me for this?