Improve Offline Chat Model Experience

khoj-ai / khoj

Your AI second brain. Get answers to your questions, whether they be online or in your own notes. Use online AI models (e.g gpt4) or private, local LLMs (e.g llama3). Self-host locally or use our cloud instance. Access from Obsidian, Emacs, Desktop app, Web or Whatsapp.

https://khoj.dev

GNU Affero General Public License v3.0

12.64k stars 640 forks source link

Improve Offline Chat Model Experience #494

Closed debanjum closed 11 months ago

debanjum commented 11 months ago

a85ff94 Make offline chat model user configurable. Use filename of any GPT4All supported model like below:

...
processor:
conversation:
  enable-offline-chat: true
  offline-chat-model: wizardlm-13b-v1.1-superhot-8k.ggmlv3.q4_0.bin
...

d1ff812 Run GPT4All Chat Model on GPU, when available via GPT4All Vulcan support
13b16a4 Use default Llama 2 supported by GPT4All
Make tokenizer and max-prompt-size of chat model user configurable. E.g When using chat models not in this pre-defined list that support larger context window or a different tokenizer.

Closes #406

sabaimran commented 11 months ago

Great to be using built-in support for Llama V2 via GPT4All going forward! A little confused about the PR description. You said "Make offline chat model user configurable", but it's still limited to only Llama V2, unless I'm missing something. What did you mean by that?

Closes #406

debanjum commented 11 months ago

You said "Make offline chat model user configurable", but it's still limited to only Llama V2, unless I'm missing something. What did you mean by that?

Updated the PR description with more details on how to use a different offline chat model. Made a few fixes to use default fallback max_prompt_size and tokenizer to make this work