nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
https://nomic.ai/gpt4all
MIT License
70.77k stars 7.71k forks source link

Top-P=1 produces fully deterministic chat responses #3087

Open brankoradovanovic-mcom opened 1 month ago

brankoradovanovic-mcom commented 1 month ago

Bug Report

When Top-P is set to 1, chat responses are fully deterministic, which shouldn't be the case.

Steps to Reproduce

  1. Take any model (I've tested this with e.g. Meta Llama 3.1 8B Instruct, but I believe it happens with all models)
  2. Set the following parameters: Temperature=1, Top-P=1, Top-K=50, Min-P=0.05. These settings would normally be expected to produce fairly high randomness.
  3. Enter a prompt expected to produce varied responses, e.g. Can you write a story about a bear and a fox?
  4. The model will produce an appropriate response
  5. Click on "Redo last chat response"
  6. The model will produce the exact same text (observed with 3.4.0 and 3.4.1)

Expected Behavior

In 3.2.1, redoing the last chat response with the above settings produces significantly different responses each time.

Your Environment

chrisbarrera commented 1 month ago

Can confirm. Also, top_p = 0.9999 removes the determinism. A top_p = 1 is only supposed to disable top_p processing (effectively passing all presented logits to the next sampler in chain) and not affecting other sampling in the chain. For me I am using version 3.4.0, macos Sonoma 14.6.1.