Open p-e-w opened 5 months ago
I keep hitting this issue as I have SillyTavern running but also some ComfyUI nodes which use the same Oobabooga backend. It isn't necessary to cancel the first request for the segmentation fault, it will happen if the two requests happen roughly at the same time.
As I don't update my instance unless I have a specific reason to do so, I've only been getting this issue recently.
Describe the bug
With the llama.cpp loader, when a running API request is cancelled, followed quickly by dispatching a second API request, the whole application crashes with a segmentation fault. This appears to happen with any GGUF model (confirmed with Mixtral, Yi-34b, Command R) when split CPU/GPU evaluation is used.
This has been happening for months, but only now have I managed to pinpoint a reliable reproduction. This might be the same thing described in #5630.
Is there an existing issue for this?
Reproduction
--api
.Screenshot
No response
Logs
Nothing else in the logs.
System Info