LangServe Crash when multiple clients send requests.

langchain-ai / langserve

LangServe 🦜️🏓

Other

1.93k stars 214 forks source link

When I send two or more requests to the server, it crashes, error logs below:

CUDA version: 11.7 NVDA Driver Version: 515.65.01

** On entry to SGEMM parameter number 13 had an illegal value

cuBLAS error 7 at /tmp/pip-install-_wvffp3m/llama-cpp-python_93b4c08269a545e2a4e8f946ea11d827/vendor/llama.cpp/ggml-cuda.cu:6140 current device: 0

CUDA error 4 at /tmp/pip-install-_wvffp3m/llama-cpp-python_93b4c08269a545e2a4e8f946ea11d827/vendor/llama.cpp/ggml-cuda.cu:455: driver shutting down current device: 0 ./bins/langchain_serve_test.sh: line 7: 311435 Segmentation fault (core dumped) python -u langchain_serve.py

langchain-ai / langserve

LangServe Crash when multiple clients send requests. #263