runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
213 stars 82 forks source link

GGUF compatibility #70

Closed adam-clarey closed 3 months ago

adam-clarey commented 3 months ago

I've used the runpod/worker-vllm:0.3.0-cuda11.8.0 container for several different LLMs and it has worked fine so far.

I've just been given a requirement to test GGUF model (specifically https://huggingface.co/impactframes/llama3_if_ai_sdpromptmkr_q4km) and it keeps generating errors:

Entry Not Found for url: https://huggingface.co/impactframes/llama3_if_ai_sdpromptmkr_q4km/resolve/main/config.json.

Is this an issue with the model, or the worker? Is there a known workaround?

Thanks

ashleykleynhans commented 3 months ago

vllm itself doesn't support GGUF, therefore the worker cannot support it either: https://github.com/vllm-project/vllm/issues/1002

alpayariyak commented 3 months ago

@ashleykleynhans’s answer is correct