Closed BradKML closed 6 months ago
I may work on streaming the output of the model in the plugin UI and allow the user to accept the answer, regenerate it or cancel.
The three responses make sense but the issue stems from the loading phase of the initial answer, assuming on laptop, since even with a light model comparing CLI vs LogSeq plugin the displays feel different.
@BradKML I am afraid I am not following correctly what you are saying. If I stream the output in the plugin UI you won't have to wait untill all the answer is generated like you would use ollama with the CLI or any ChatBotUI
I will double check how they do the output, might be something on my part
I think streaming typically used in chatUI is mimicking the feeling of chatting. While it's not hard to add streaming to the plugin UI, it does seem to be a distraction to docunebts/blocks based user experience with Logseq.
Okay that cleared things up a bit when the usual UX is word-by-word but some like a more chat-like or message-like feel. The former feels for sensible when used in a low GPU power context, while the latter feels sufficient with fast responses. Will find a way around it later.
When using other LLM UX tools (GPT4All and Khoj comes to mind), by-token rendering is done, but since rendering every word in LogSeq might cause issues, would it be possible to only partially render a result based on a certain break character (e.g. period marks or newlines)?