ERROR:WARNING 10-03 20:16:19 _custom_ops.py:18] Failed to import from vllm._C with ImportError('libcuda.so.1: cannot open shared object file: No such file or directory')
Traceback (most recent call last):
File "/home/ubuntu/llm/app1.py", line 35, in
llm = LLM(model="llama-2-7b-chat.Q5_K_S.gguf")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/llm/myenv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 214, in init
self.llm_engine = LLMEngine.from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/llm/myenv/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 561, in from_engine_args
engine_config = engine_args.create_engine_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/llm/myenv/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 873, in create_engine_config
device_config = DeviceConfig(device=self.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/llm/myenv/lib/python3.12/site-packages/vllm/config.py", line 1081, in init
raise RuntimeError("Failed to infer device type")
RuntimeError: Failed to infer device type
How would you like to use vllm
I want to use vllm with GGUF models
Before submitting a new issue...
[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Your current environment
CODE:
from langchain.llms import VLLM import time import uvicorn
app = FastAPI()
llm = VLLM(model="tiiuae/falcon-7b-instruct", trust_remote_code=True, # mandatory for hf models max_new_tokens=50, temperature=0.6 )
@app.get("/") def read_root(): return {"Hello": "World"}
@app.post("/v1/generateText") async def generateText(request: Request) -> Response: request_dict = await request.json() prompt = request_dict.pop("prompt") print(prompt) output = llm(prompt) print("Generated text:", output) ret = {"text": output} return JSONResponse(ret)
ERROR:WARNING 10-03 20:16:19 _custom_ops.py:18] Failed to import from vllm._C with ImportError('libcuda.so.1: cannot open shared object file: No such file or directory') Traceback (most recent call last): File "/home/ubuntu/llm/app1.py", line 35, in
llm = LLM(model="llama-2-7b-chat.Q5_K_S.gguf")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/llm/myenv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 214, in init
self.llm_engine = LLMEngine.from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/llm/myenv/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 561, in from_engine_args
engine_config = engine_args.create_engine_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/llm/myenv/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 873, in create_engine_config
device_config = DeviceConfig(device=self.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/llm/myenv/lib/python3.12/site-packages/vllm/config.py", line 1081, in init
raise RuntimeError("Failed to infer device type")
RuntimeError: Failed to infer device type
How would you like to use vllm
I want to use vllm with GGUF models
Before submitting a new issue...