Chat UI: set n_ctx in models2.json

davidiwharper commented 6 months ago

Feature request

Now that the context window is variable (per #1668) it would be helpful to have models2.json updated to populate the n_ctx field along with the correct system and user prompts.

This could be accomplished by adding in the filed contextLength or similar for each model.

Motivation

At the moment, the default context window remains 2048 tokens, albeit user-configurable. Populating the n_ctx value upon the installation of a model will allow for more effective use of this new feature.

Your contribution

Looking through the models in models2.json, I think that the correct context values are:

Mistral OpenOrca: 4096 (https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca/blob/main/config.json "sliding_window")
Mistral Instruct: 4096 (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/blob/main/config.json "sliding_window")
GPT4All Falcoln: 2048 (https://huggingface.co/tiiuae/falcon-7b, the source model)
Orca2 Medium: ??4096 (https://arxiv.org/pdf/2311.11045.pdf p8)
Orca2 Full: ?4096 (https://huggingface.co/microsoft/Orca-2-13b model_max_length=4096)
Wizard v1.2: 4096 (https://conclusionlab.com/llm/WizardLM-WizardLM-13B-V1.2)
Hermes: 4096 (https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b/discussions/7)
GPT4All Snoozy: 2048 (https://llm.extractum.io/model/TheBloke%2FGPT4All-13B-snoozy-GPTQ,4S2tA74MFGmT3FueEg9sUt)
MPT Chat: 4096 (https://huggingface.co/mosaicml/mpt-7b-chat config.max_seq_len)
Orca Mini: ??1024 (https://huggingface.co/pankajmathur/orca_mini_3b but this is unclear)
EM German Mistral: 4096 (https://huggingface.co/jphme/em_german_mistral_v01/discussions/2)

cebtenzzre commented 6 months ago

My concern is that people are going to run out of RAM/VRAM if we increase the default context limit, and then complain that GPT4All is slower/crashing.

dlippold commented 6 months ago

Is it correct that the value of the (maximal) context length of a model is n_ctx and that it is contained in the gguf model file? Then it would not be necessary that the value is also contained in the file models2.json. But the GUI should display the value, next to the currently used value of the context length. And ideally also the amount of GPU memory which was free when the GUI was started should be displayed.

woheller69 commented 4 months ago

In the Python bindings n_ctx is always 2048, independent of the model as far as I can see. So it seems it is not read from gguf which should have that parameter.

cebtenzzre commented 4 months ago

In the Python bindings n_ctx is always 2048

n_ctx defaults to 2048, you can change it when you construct a GPT4All and it will warn on console if you exceed the model's trained context.

The reason it doesn't automatically set itself to the maximum is that more context requires more RAM/VRAM, and this would cause models to fail to load or crash once sufficient context has been used when they didn't previously.

woheller69 commented 4 months ago

is there a way to read the max n_ctx of the model via the Python library?

cebtenzzre commented 4 months ago

is there a way to read the max n_ctx of the model via the Python library?

Right now you can do it by directly reading the value from the GGUF file using the gguf package - it's Keys.LLM.CONTEXT_LENGTH.format(arch=arch), where arch is the value corresponding to Keys.General.ARCHITECTURE. If you'd like to be able to do it directly with the GPT4All bindings, you should open a feature request.

nomic-ai / gpt4all