Open loayghawji opened 1 month ago
I have the same case, when 2 requests gets simultaneous sent it gives me a Segmentation Fault error.
this is only a prototype, and serve one 1 request at a time, you cant use it for production, use vllm intead of this, thats more robust
@imad-ict Thank you for your response. So if I go with Ollama instead of llamaCPP will I have anything changed ?
when sending two request to the API at the same time I get the following :
privateGPT-main\PrivateGpt\Lib\site-packages\llama_cpp\llama_cpp.py", line 1622, in llama_decode
return _lib.llama_decode(ctx, batch) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: exception: access violation reading 0x0000000000002600