nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
https://nomic.ai/gpt4all
MIT License
70.95k stars 7.72k forks source link

GPT4All v3.1.1: Replies from an LLM sometimes contain framing tokens from its System Prompt or Prompt Template #2779

Open SINAPSA-IC opened 4 months ago

SINAPSA-IC commented 4 months ago

Bug Report

Sometimes, framing tokens from the System Prompt or the Prompt Template of a model seep into some of its replies.

The sequence

evolution: Hyp<|im_start|> bradytely refers to

Also, the error somehow propagated down the reply, with the broken word restored, the alien sequence being replaced with the 1 letter/character that was previously thrown out:

era. The hypobradytely hypothesis suggests

(https://en.wiktionary.org/wiki/hypobradytely)

This seems not to be a reproducible experiment:

Steps to Reproduce

I can't tell. This time, 1 collection of LocalDocs was being used.

Expected Behavior

Replies should not contain <framing tokens> originating in the System Prompt and/or Prompt Template.

Your Environment

SINAPSA-IC commented 4 months ago

img_ui18

benja0x40 commented 4 months ago

Similar issue with the following setup:

GPT4All 3.1.1 MacBook Air M2 24GB RAM under macOS Sonoma 14.5 Model: Gemma-2-9B-it-Q8_0

See screenshots for further details.

Capture d’écran 2024-08-01 à 15 00 36 Capture d’écran 2024-08-01 à 15 00 58 Capture d’écran 2024-08-01 à 15 01 08
ThiloteE commented 4 months ago

Please provide links to where you got the model from, as the date of quantization is very important and the config that was used for quantization too.

Gemma-2-9B-it is not officially supported yet. There are some issues with that model. You will have to quantize it with custom configuration. Maybe Nous Hermes 2 Mistral DPO requires re-quantization too. I would need to look up its config files. It has been a while since it was added to GPT4All and upstream llama.cpp has changed a lot during that time, so old quants might have to be deprecated. Could also be that this is a sign of the model being finetuned badly, if it only happens very rarely.

benja0x40 commented 4 months ago

Here is the link to the Gemma-2 model quantisation dating from July 16 and which I downloaded on July 26. https://huggingface.co/lmstudio-community/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q8_0.gguf

ThiloteE commented 4 months ago

Might also be related to https://github.com/nomic-ai/gpt4all/pull/2701

cebtenzzre commented 3 months ago

@benja0x40 your issue is not the same—this is the model intentionally sending one (effective) EOS token as it was trained to, but the GGUF file specifies a different one. Unless the model authors change their EOS token to the actual EOT token, or llama.cpp implements a way to have multiple recognized EOS tokens (e.g. by specifying EOS and EOT separately), we can only work around this (see #2439).

@SINAPSA-IC as to your issue, there are a few possible explanations, but it could have been fixed in #2778 (or even better, #2781). So I suggest you build from the latest main branch, or wait until the next release, before we investigate this further. But before #2701 these tokens were invisible and not printed, so the issue may have been happening but only the model could see it, and not the user.

benja0x40 commented 3 months ago

@cebtenzzre Thx for the explanation.