Segmentation fault with llama.cpp when making two API requests in quick succession

Describe the bug

With the llama.cpp loader, when a running API request is cancelled, followed quickly by dispatching a second API request, the whole application crashes with a segmentation fault. This appears to happen with any GGUF model (confirmed with Mixtral, Yi-34b, Command R) when split CPU/GPU evaluation is used.

This has been happening for months, but only now have I managed to pinpoint a reliable reproduction. This might be the same thing described in #5630.

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Start TGWUI with --api.
Load a GGUF model with llama.cpp.
Make a request to the text completion API.
Cancel the request while it is running.
Immediately (after 1 second or less) make another request identical to the first one.

Screenshot

No response

Logs

Segmentation fault (core dumped)

Nothing else in the logs.

System Info

RTX 3060 12 GB on Ubuntu 22.04.

oobabooga / text-generation-webui