Open SINAPSA-IC opened 4 months ago
Similar issue with the following setup:
GPT4All 3.1.1 MacBook Air M2 24GB RAM under macOS Sonoma 14.5 Model: Gemma-2-9B-it-Q8_0
See screenshots for further details.
Please provide links to where you got the model from, as the date of quantization is very important and the config that was used for quantization too.
Gemma-2-9B-it is not officially supported yet. There are some issues with that model. You will have to quantize it with custom configuration. Maybe Nous Hermes 2 Mistral DPO requires re-quantization too. I would need to look up its config files. It has been a while since it was added to GPT4All and upstream llama.cpp has changed a lot during that time, so old quants might have to be deprecated. Could also be that this is a sign of the model being finetuned badly, if it only happens very rarely.
Here is the link to the Gemma-2 model quantisation dating from July 16 and which I downloaded on July 26. https://huggingface.co/lmstudio-community/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q8_0.gguf
Might also be related to https://github.com/nomic-ai/gpt4all/pull/2701
@benja0x40 your issue is not the same—this is the model intentionally sending one (effective) EOS token as it was trained to, but the GGUF file specifies a different one. Unless the model authors change their EOS token to the actual EOT token, or llama.cpp implements a way to have multiple recognized EOS tokens (e.g. by specifying EOS and EOT separately), we can only work around this (see #2439).
@SINAPSA-IC as to your issue, there are a few possible explanations, but it could have been fixed in #2778 (or even better, #2781). So I suggest you build from the latest main branch, or wait until the next release, before we investigate this further. But before #2701 these tokens were invisible and not printed, so the issue may have been happening but only the model could see it, and not the user.
@cebtenzzre Thx for the explanation.
Bug Report
Sometimes, framing tokens from the System Prompt or the Prompt Template of a model seep into some of its replies.
The sequence
evolution: Hyp<|im_start|> bradytely refers to
Also, the error somehow propagated down the reply, with the broken word restored, the alien sequence being replaced with the 1 letter/character that was previously thrown out:
era. The hypobradytely hypothesis suggests
(https://en.wiktionary.org/wiki/hypobradytely)
This seems not to be a reproducible experiment:
Steps to Reproduce
I can't tell. This time, 1 collection of LocalDocs was being used.
Expected Behavior
Replies should not contain <framing tokens> originating in the System Prompt and/or Prompt Template.
Your Environment
GPT4All version: v3.1.1
Operating System: Windows 10 Pro, updated as of 2024.07
Chat model used (if applicable): Nous Hermes 2 Mistral DPO
Context Length: 30720
Temperature: 0.1
Top K: 50
Top P: 0.6
Min P: 0