Open SpkArtZen opened 4 weeks ago
Can you give us more details about your environment? Probably, it will related to GPU and vRAM.
Yes, i use default model llama 3.1 7B
Full logs: logs.txt I send single request from python sdk. It works the same with postman and curl
It should work equally using postman and requests. Can you increate request timeout?
client = PrivateGPTApi(base_url="http://localhost:8001", client=...)
And two mode things to take into account:
The main problem is that when I send a request, even through Postman, the response is generated multiple times and degrades each time. The same with sdk and Postman. Also, it itself sends a request:
2024-11-04 15:36:54 13:36:54.133 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK" 2024-11-04 15:36:59 [GIN] 2024/11/04 - 13:36:59 | 200 | 5.996617632s | 127.0.0.1 | POST "/api/chat"
After that its generate responce again. I need somehow accept only first responce.
Question
I have an issue with Private GPT:
When I send a prompt or chat completion with a large context (file size > 5 KB or multiple context files), the chat takes a long time to generate a response but never sends it. It just keeps generating a response, and the delay gets worse. Eventually, it sends a timeout error.
I don’t know how to fix this. I need to get its initial response, but in the end, I don’t receive anything