vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.71k stars 4.66k forks source link

[Usage]: Tried using vllm with GGUF models. Got an infer device type error. #9051

Open asokans11 opened 1 month ago

asokans11 commented 1 month ago

Your current environment

The output of `python collect_env.py`

CODE:

from langchain.llms import VLLM import time import uvicorn

app = FastAPI()

llm = VLLM(model="tiiuae/falcon-7b-instruct", trust_remote_code=True, # mandatory for hf models max_new_tokens=50, temperature=0.6 )

@app.get("/") def read_root(): return {"Hello": "World"}

@app.post("/v1/generateText") async def generateText(request: Request) -> Response: request_dict = await request.json() prompt = request_dict.pop("prompt") print(prompt) output = llm(prompt) print("Generated text:", output) ret = {"text": output} return JSONResponse(ret)

ERROR:WARNING 10-03 20:16:19 _custom_ops.py:18] Failed to import from vllm._C with ImportError('libcuda.so.1: cannot open shared object file: No such file or directory') Traceback (most recent call last): File "/home/ubuntu/llm/app1.py", line 35, in llm = LLM(model="llama-2-7b-chat.Q5_K_S.gguf") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/llm/myenv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 214, in init self.llm_engine = LLMEngine.from_engine_args( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/llm/myenv/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 561, in from_engine_args engine_config = engine_args.create_engine_config() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/llm/myenv/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 873, in create_engine_config device_config = DeviceConfig(device=self.device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/llm/myenv/lib/python3.12/site-packages/vllm/config.py", line 1081, in init raise RuntimeError("Failed to infer device type") RuntimeError: Failed to infer device type

How would you like to use vllm

I want to use vllm with GGUF models

Before submitting a new issue...

DarkLight1337 commented 1 month ago

To better debug the issue, please run collect_env.py from vLLM repo and report the output.