[Usage]: Tried using vllm with GGUF models. Got an infer device type error.

Your current environment

The output of `python collect_env.py`

CODE:

from langchain.llms import VLLM import time import uvicorn

app = FastAPI()

llm = VLLM(model="tiiuae/falcon-7b-instruct", trust_remote_code=True, # mandatory for hf models max_new_tokens=50, temperature=0.6 )

@app.get("/") def read_root(): return {"Hello": "World"}

@app.post("/v1/generateText") async def generateText(request: Request) -> Response: request_dict = await request.json() prompt = request_dict.pop("prompt") print(prompt) output = llm(prompt) print("Generated text:", output) ret = {"text": output} return JSONResponse(ret)

ERROR:WARNING 10-03 20:16:19 _custom_ops.py:18] Failed to import from vllm._C with ImportError('libcuda.so.1: cannot open shared object file: No such file or directory') Traceback (most recent call last): File "/home/ubuntu/llm/app1.py", line 35, in llm = LLM(model="llama-2-7b-chat.Q5_K_S.gguf") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/llm/myenv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 214, in init self.llm_engine = LLMEngine.from_engine_args( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/llm/myenv/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 561, in from_engine_args engine_config = engine_args.create_engine_config() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/llm/myenv/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 873, in create_engine_config device_config = DeviceConfig(device=self.device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/llm/myenv/lib/python3.12/site-packages/vllm/config.py", line 1081, in init raise RuntimeError("Failed to infer device type") RuntimeError: Failed to infer device type

How would you like to use vllm

I want to use vllm with GGUF models

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

vllm-project / vllm