helixml / helix

Multi-node production AI stack. Run the best of open source AI easily on your own servers. Easily add knowledge from documents and scrape websites. Create your own AI by fine-tuning open source models. Integrate LLMs with APIs. Run gptscript securely on the server
https://tryhelix.ai
Other
331 stars 26 forks source link

when the UI connects to a livestream session, prefix can be truncated #166

Open lukemarsden opened 8 months ago

lukemarsden commented 8 months ago

when the UI connects to a live streaming session (one where the model is still generating output), it only receives new updates, often missing out the start.

This is particularly noticable now that NATS has made the output damn quick

Ideally, when you open a websocket to start streaming responses, you start by receiving in a single chunk all the result so far for a given interaction. Will we need to start buffering this in the API server?

rusenask commented 8 months ago

we could probably either do nats streaming or keep updating the database with deltas while it's generating every 3-5 seconds and then load from the DB

lukemarsden commented 8 months ago

Would NATS streaming solve this with stream level persistence?

I like that more because with the 3-5 second option matching up the end of the database text with the start of the streaming text is tricky

rusenask commented 8 months ago

yeah I think it would

lukemarsden commented 8 months ago

Cool let's plug it in

lukemarsden.net @.*** @lmarsden https://twitter.com/lmarsden

On Sun, 4 Feb 2024 at 15:58, Karolis Rusenas @.***> wrote:

yeah I think it would

— Reply to this email directly, view it on GitHub https://github.com/helixml/helix/issues/166#issuecomment-1925804044, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACATUVAXMAY4HA6LWM6NUTYR6V3BAVCNFSM6AAAAABCYVTSKGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVHAYDIMBUGQ . You are receiving this because you authored the thread.Message ID: @.***>

lukemarsden commented 8 months ago

Maybe there's also a quicker workaround to get the frontend to subscribe to the websocket faster when a new session is created?