Open sabaimran opened 11 months ago
Mini-update. I looked into this today and found that gpt4all only supports shortcircuiting the model response after tokens have already started emitting. So, you can't stop it from 'thinking', so to speak, once it's already been given a query. To that end, I'll update the UI so that you can cancel the query once tokens are being spit out, but not before then.
Hopefully the time to first token issue will be less of a headache for folks using Mistral. That'll become the default model (see commit https://github.com/khoj-ai/khoj/commit/0f1ebcae18abc8969cb367564077ef8d20695be3) in the next release.
The local llama chat response can take minutes sometimes. If you want to update the request and tweak it, then this can mean a lot of waiting in order to retry your request. Add some way to send an interrupt signal from the UI to cancel the request.
See relevant discussion on Discord.