Sendding two Chat Completions requests at the same time fails

zylon-ai / private-gpt

Interact with your documents using the power of GPT, 100% privately, no data leaks

https://docs.privategpt.dev

Apache License 2.0

52.83k stars 7.11k forks source link

Sendding two Chat Completions requests at the same time fails #1955

Open loayghawji opened 1 month ago

loayghawji commented 1 month ago

when sending two request to the API at the same time I get the following :

privateGPT-main\PrivateGpt\Lib\site-packages\llama_cpp\llama_cpp.py", line 1622, in llama_decode
return _lib.llama_decode(ctx, batch) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: exception: access violation reading 0x0000000000002600

iulix21 commented 1 month ago

I have the same case, when 2 requests gets simultaneous sent it gives me a Segmentation Fault error.

imad-ict commented 3 weeks ago

this is only a prototype, and serve one 1 request at a time, you cant use it for production, use vllm intead of this, thats more robust

loayghawji commented 3 weeks ago

@imad-ict Thank you for your response. So if I go with Ollama instead of llamaCPP will I have anything changed ?