Closed JulianOestreich90 closed 1 month ago
To address the issues you're facing with connecting to a self-hosted remote LLM using llama.cpp
within a Docker container, here are some focused steps and clarifications:
Error Handling Update: The AttributeError: 'HTTPStatusError' object has no attribute 'message'
occurs because the .message
attribute is deprecated in Python 3. Instead of exception.message
, use str(exception)
or exception.args
to access the error message. This adjustment is necessary for both OpenLLMAPI
and OpenAILike
classes when handling exceptions.
Using LlamaCPP with Docker: Ensure you have the llama-cpp-python
package installed in your environment. This package is crucial for interfacing with llama.cpp
models. If it's not installed, you can add it using pip install llama-cpp-python
.
Model Configuration: When setting up your model, you can specify either a model_path
for a pre-downloaded model or a model_url
for dynamic downloading. Adjust your Docker setup accordingly to ensure the model is accessible or the container can download the model.
Text Generation Parameters: Customize parameters like temperature
, max_new_tokens
, and context_window
to control the model's output. Use the LlamaCPP
class for setting these parameters and generating text.
Docker Container Setup: Verify that your Docker container is correctly configured for model access (either through a local file or internet access for downloading) and that the llama-cpp-python
package is installed within the container.
Advanced Features: Explore LlamaCPP
class's advanced features for applications like real-time chat interactions.
For detailed implementation, refer to the LlamaCPP
class documentation and ensure your code is updated to handle exceptions correctly. This should help you successfully connect to and interact with your self-hosted LLM.
To continue the conversation, mention @dosu.
@dosu why should i install llama-cpp-python
on my server? Does that mean llama index can not handle the API spawned by the llama.cpp http server? It is supposed to be an OpenAI compatible API as well.
The issue with the AttributeError: 'HTTPStatusError' object has no attribute 'message'
you're encountering is due to the way exceptions are handled in the OpenLLMAPI
class. Modern Python versions and HTTP client libraries don't use a .message
attribute on exception objects. Instead, you should use str(exception)
or access specific attributes like .status_code
or .detail
to get the error information. This adjustment is necessary for proper error handling in your code.
Regarding the OpenAILike
class connection errors, these could stem from various configuration issues such as incorrect API keys, missing dependencies, or incompatible model metadata. Ensure all configurations and dependencies are correctly set up.
For your question about installing llama-cpp-python
on your server, the LlamaCPP
class within llama_index
is designed to work directly with the llama-cpp
library, not just through an HTTP server API. This direct integration requires the llama-cpp-python
package to be installed on your server for the LlamaCPP
class to function correctly. It uses the Llama
class from the llama_cpp
library for model initialization, completions, and other tasks, which is why the package is necessary even if the llama.cpp
HTTP server provides an OpenAI-compatible API.
To continue the conversation, mention @dosu.
Hi @JulianOestreich90, in addition to the API you provided, llama.cpp also supports OpenAI-compatible Chat Completions API (search POST /v1/chat/completions
). If you use that API, I guess OpenAI should suffice.
Thanks for the suggestion of OpenAI, it did work for me although I've had to mess with the parameters a bit. I ended up with :
llm = OpenAI(api_key="somestring", api_base="http://192.168.12.132:3300")
Leaving the API Key empty breaks it at the moment, it is trying to do authentication with an invalid query.
See this girst for my full RAG script using a single RTX 3070 running llama 8B served with llama.cpp server. I've had to put the embeddings model on CPU since my GPU does not have enough memory.
It's quite satisfying to watch. I randomly picked a few wikipedia articles about giraffes and related concepts for my test, I put the result in as a gist comment.
https://gist.github.com/jggc/5ba572a4753bc851e6d33d07e82b39f4
Question Validation
Question
I am using llama.cpp http server to serve a model with a docker container:
docker run -d -p ipadress:8080:8080 -v ~/models:/models --gpus all ghcr.io/ggerganov/llama.cpp:server-cuda -m models/mistral-7b-instruct-v0.2.Q4_0.gguf -c 32000 --host 0.0.0.0 --port 8080 --n-gpu-layers 99
I can talk with the model via curl, e.g.:
And it returns a response.
However integrating it into llama index does't seem so easy. I tried these two options:
The OpenLLMAPI Options errors with
AttributeError: 'HTTPStatusError' object has no attribute 'message'
and the OpenAILike class also gives connection errors.What can i do to connect properly to a self hosted remote llm?