Open mrepetto-certx opened 8 months ago
Repeating the same by simply copying the repo and following #1445 causes the same problem, plus the following:
There was a problem when trying to write in your cache folder (/nonexistent/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
@mrepetto-certx, can you be more specific please? I have the same issue
@mrepetto-certx, can you be more specific please? I have the same issue
Well. To reproduce:
git clone https://github.com/imartinez/privateGPT
cd PrivateGPT
docker compose --build
docker compose run --rm --entrypoint="bash -c '[ -f scripts/setup ] && scripts/setup'" private-gpt
I do not know how to be more specific than that.
I think local
should be substituted with ollama
https://github.com/imartinez/privateGPT/commit/45f05711eb71ffccdedb26f37e680ced55795d44
Indeed
services:
private-gpt:
build:
dockerfile: Dockerfile.local
volumes:
- ./local_data/:/home/worker/app/local_data
- ./models/:/home/worker/app/models
ports:
- 8001:8080
environment:
PORT: 8080
PGPT_PROFILES: docker
PGPT_MODE: llamacpp
quite works. But still requires an embedding mode, which is different from llamacpp
.
I am still getting the same error even when I change to llamacpp. Should I do any prerequisite before doing docker-compose build
such as setting any env variables. Downloading any modules etc.?
Unfortunately I got your same result. The problem is given by the split in between the llm and embedding lines in local settings file.
I suggest using ollama and compose an additional container into the compose file.
Da: venkat chinni @.> Inviato: Friday, March 22, 2024 7:11:26 AM A: zylon-ai/private-gpt @.> Cc: Marco Repetto @.>; Mention @.> Oggetto: Re: [zylon-ai/private-gpt] Pydantic validation error with ['default', 'docker'] (Issue #1756)
I am still getting the same error even when I change to llamacpp. Should I do any prerequisite before doing docker-compose build such as setting any env variables. Downloading any modules etc.?
— Reply to this email directly, view it on GitHubhttps://github.com/zylon-ai/private-gpt/issues/1756#issuecomment-2014424322, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A4PXYPP7MU6QOQT4R6DPTT3YZPDQ5AVCNFSM6AAAAABE3J4VL6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJUGQZDIMZSGI. You are receiving this because you were mentioned.Message ID: @.***>
I think I can help a little, if you are trying to use Ollama, which you will need to get installed and running first, then make these changes settings.yaml: from localhost to host.docker.internal here: ollama: llm_model: llama2 embedding_model: nomic-embed-text api_base: http://host.docker.internal:11434
In docker-compose.yaml change dockerfile: Dockerfile.local to dockerfile: Dockerfile.external
in Dockerfile.external add these extras: RUN poetry install --extras "ui vector-stores-qdrant llms-ollama embeddings-ollama"
then do a docker compose build and then docker compose up
you will probably need to run ollama pull nomic-embed-text if you get the error about not having nomic
I hope this helps. I was able to finally get it running on my M2 MacBook Air.
I made these changes:
RUN poetry install --extras "ui embeddings-huggingface llms-llama-cpp vector-stores-qdrant llms-ollama embeddings-ollama"
in my Dockerfile.local
set PGPT_MODE: ollama
in my docker-compose.
downloaded ollama docker image and ran it separately
ran ollama pull nomic-embed-text
in my ollama docker container.
I am still facing this issue:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/embeddings (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffff4cd571d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
My ollama server is running however when i get http://localhost:11434/api/embeddings
, i get a 404. Any ideas on this? @makeSmartio
What about step 1 with changing localhost to: api_base: http://host.docker.internal:11434/ in the file settings.yaml. The problem with localhost is that the docker container thinks it is localhost (it is.) host.docker.internal is the host address from the container's point of view
I also get a 404 for http://localhost:11434/api/embeddings, so no issue there.
What is your take on decupling it in a way that ollama is used as a microservice? Something like:
services:
private-gpt:
build:
dockerfile: Dockerfile.local
volumes:
- ./local_data/:/home/worker/app/local_data
ports:
- 8001:8080
environment:
PORT: 8080
PGPT_PROFILES: docker
PGPT_MODE: ollama
ollama:
build:
image: ollama/ollama
command: ollama pull nomic-embed-text
With the settings-ollama.yaml
:
server:
env_name: ${APP_ENV:ollama}
llm:
mode: ollama
max_new_tokens: 512
context_window: 3900
temperature: 0.1 #The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual. (Default: 0.1)
embedding:
mode: ollama
ollama:
llm_model: mistral
embedding_model: nomic-embed-text
api_base: http://ollama:11434
tfs_z: 1.0 # Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.
top_k: 40 # Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)
top_p: 0.9 # Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
repeat_last_n: 64 # Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)
repeat_penalty: 1.2 # Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)
request_timeout: 120.0 # Time elapsed until ollama times out the request. Default is 120s. Format is float.
vectorstore:
database: qdrant
qdrant:
path: local_data/private_gpt/qdrant
@mrepetto-certx Makes sense to me. Even if people already have Ollama installed this would just be another instance. You'd still need to tackle the addressing problem, though - it would either need to be http://host.docker.internal:11434/ for host installations or http://ollama:11434/ for Dockerized.
Edit: It would also take quite a bit of testing for adding the llm and embedding models for the dockerized method
Thanks @makeSmartio. I'm experimenting now with the caveat of having:
ollama:
image: ollama/ollama:latest
volumes:
- ./ollama:/root/.ollama
To avoid the problem of pulling a new model every docker compose. I'll keep you posted.
No way I keep getting:
[WARNING ] llama_index.core.chat_engine.types - Encountered exception writing response to history: [Errno 99] Cannot assign requested address
What is puzzling is that running:
from llama_index.llms.ollama import Ollama
model = Ollama(model="mistral", base_url="http://ollama:11434", request_timeout=120.0)
resp = model.complete("Who is Paul Graham?")
print(resp)
inside the container works.
Ok, I managed to make it work and pushed a pull request #1812 . The only thing to remember is to run ollama pull
the first time to load the models but then they will stay in the host environment, similar to the previous behavior.
I tried to run
docker compose run --rm --entrypoint="bash -c '[ -f scripts/setup ] && scripts/setup'" private-gpt
In a compose file somewhat similar to the repo:
But I got in return the following error: