henk717 / KoboldAI

KoboldAI is generative AI software optimized for fictional use, but capable of much more!
http://koboldai.com
GNU Affero General Public License v3.0
372 stars 135 forks source link

Token streaming doesn't work #517

Open LoganDark opened 6 months ago

LoganDark commented 6 months ago

kobold_debug.json

For some reason token streaming just does not work. It's enabled and the actual terminal output from the server updates every token but no messages are actually sent over websocket to the UI so it can't be displayed until the response is complete. No idea what is going on.

I'm on the latest United commit 1e985ed51bcdde506c223a93017aa05647792063.

henk717 commented 6 months ago

What kind of model / api was it hooked up to?

LoganDark commented 6 months ago

Thought that would be in the debug json, but I've tried with both LLaMA 2 and Mixtral 8x7B in GGUF format, running on KoboldCPP (with cuBLAS and full offload to a 3090). I'm using the KoboldAI United UI (localhost:5000, not lite).

henk717 commented 6 months ago

United can't stream over the API thats why streaming is missing.

LoganDark commented 6 months ago

What do you mean it can't stream over the API? So it can't stream at all?

henk717 commented 6 months ago

It can stream when you use huggingface based models in the main UI.

LoganDark commented 6 months ago

So I can't use my 3090 to run models? Or I can't use GGUF files?

henk717 commented 6 months ago

You can't use GGUF's combined with United combined with streaming. You can use it when you directly use Koboldcpp in its own bundled KoboldAI Lite.

LoganDark commented 6 months ago

OK, so the solution is to not use GGUF then? the lite UI is mostly unusable for me (it works fine, it just has an awful user experience)

henk717 commented 6 months ago

Yes, the backends built in to KoboldAI United should work (Huggingface, exllama2)