Open davidiwharper opened 6 months ago
My concern is that people are going to run out of RAM/VRAM if we increase the default context limit, and then complain that GPT4All is slower/crashing.
Is it correct that the value of the (maximal) context length of a model is n_ctx
and that it is contained in the gguf model file? Then it would not be necessary that the value is also contained in the file models2.json
. But the GUI should display the value, next to the currently used value of the context length. And ideally also the amount of GPU memory which was free when the GUI was started should be displayed.
In the Python bindings n_ctx is always 2048, independent of the model as far as I can see. So it seems it is not read from gguf which should have that parameter.
In the Python bindings n_ctx is always 2048
n_ctx defaults to 2048, you can change it when you construct a GPT4All
and it will warn on console if you exceed the model's trained context.
The reason it doesn't automatically set itself to the maximum is that more context requires more RAM/VRAM, and this would cause models to fail to load or crash once sufficient context has been used when they didn't previously.
is there a way to read the max n_ctx of the model via the Python library?
is there a way to read the max n_ctx of the model via the Python library?
Right now you can do it by directly reading the value from the GGUF file using the gguf
package - it's Keys.LLM.CONTEXT_LENGTH.format(arch=arch)
, where arch
is the value corresponding to Keys.General.ARCHITECTURE
. If you'd like to be able to do it directly with the GPT4All bindings, you should open a feature request.
Feature request
Now that the context window is variable (per #1668) it would be helpful to have models2.json updated to populate the n_ctx field along with the correct system and user prompts.
This could be accomplished by adding in the filed contextLength or similar for each model.
Motivation
At the moment, the default context window remains 2048 tokens, albeit user-configurable. Populating the n_ctx value upon the installation of a model will allow for more effective use of this new feature.
Your contribution
Looking through the models in models2.json, I think that the correct context values are: