GPT4All v3.1.1: Replies from an LLM sometimes contain framing tokens from its System Prompt or Prompt Template

SINAPSA-IC commented 4 months ago

Bug Report

Sometimes, framing tokens from the System Prompt or the Prompt Template of a model seep into some of its replies.

The sequence

inserted itself into a word
replaced 1 letter/character and inserted a blank space to its right (LTR is systemwide)
and ofc, rendered that short part of the reply unintelligible:

evolution: Hyp<|im_start|> bradytely refers to

Also, the error somehow propagated down the reply, with the broken word restored, the alien sequence being replaced with the 1 letter/character that was previously thrown out:

era. The hypobradytely hypothesis suggests

(https://en.wiktionary.org/wiki/hypobradytely)

This seems not to be a reproducible experiment:

I haven't noted down the model when this has happened time and again (this is the 3rd time I see this, it may have occurred more than 2 times prior to this)
the image below refers to Hermes 2 Mistral DPO, whose System Prompt and Prompt Template were left as they were, the default texts;
the only common factor of such events is the length of such a reply - in the thousands of tokens (may be over 3000).

Steps to Reproduce

I can't tell. This time, 1 collection of LocalDocs was being used.

Expected Behavior

Replies should not contain <framing tokens> originating in the System Prompt and/or Prompt Template.

Your Environment

GPT4All version: v3.1.1
Operating System: Windows 10 Pro, updated as of 2024.07
Chat model used (if applicable): Nous Hermes 2 Mistral DPO
Context Length: 30720
Temperature: 0.1
Top K: 50
Top P: 0.6
Min P: 0

SINAPSA-IC commented 4 months ago

img_ui18

benja0x40 commented 4 months ago

Similar issue with the following setup:

GPT4All 3.1.1 MacBook Air M2 24GB RAM under macOS Sonoma 14.5 Model: Gemma-2-9B-it-Q8_0

See screenshots for further details.

ThiloteE commented 4 months ago

Please provide links to where you got the model from, as the date of quantization is very important and the config that was used for quantization too.

Gemma-2-9B-it is not officially supported yet. There are some issues with that model. You will have to quantize it with custom configuration. Maybe Nous Hermes 2 Mistral DPO requires re-quantization too. I would need to look up its config files. It has been a while since it was added to GPT4All and upstream llama.cpp has changed a lot during that time, so old quants might have to be deprecated. Could also be that this is a sign of the model being finetuned badly, if it only happens very rarely.

benja0x40 commented 4 months ago

Here is the link to the Gemma-2 model quantisation dating from July 16 and which I downloaded on July 26. https://huggingface.co/lmstudio-community/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q8_0.gguf

ThiloteE commented 4 months ago

Might also be related to https://github.com/nomic-ai/gpt4all/pull/2701

cebtenzzre commented 3 months ago

@benja0x40 your issue is not the same—this is the model intentionally sending one (effective) EOS token as it was trained to, but the GGUF file specifies a different one. Unless the model authors change their EOS token to the actual EOT token, or llama.cpp implements a way to have multiple recognized EOS tokens (e.g. by specifying EOS and EOT separately), we can only work around this (see #2439).

@SINAPSA-IC as to your issue, there are a few possible explanations, but it could have been fixed in #2778 (or even better, #2781). So I suggest you build from the latest main branch, or wait until the next release, before we investigate this further. But before #2701 these tokens were invisible and not printed, so the issue may have been happening but only the model could see it, and not the user.

benja0x40 commented 3 months ago

@cebtenzzre Thx for the explanation.

nomic-ai / gpt4all