sam-paech / antislop-sampler

Apache License 2.0
214 stars 23 forks source link

Possible issue in the custom code that picks up the next token #1

Open Mihaiii opened 2 weeks ago

Mihaiii commented 2 weeks ago

Hi!

I'm trying to make the output deterministic in preparation for a PR for something else. To isolate the issue, since there's no point to set so_sample=False, I'm setting temperature = 0.01 and slop_phrase_prob_adjustments={"sddsadsadwedqdqw": 0.5},. My expectation was for this to fallback to the default generate and offer a deterministic output, but this doesn't happen.

Example to replicate:

for token in generate_antislop(
    model=model,
    tokenizer=tokenizer,
    prompt="Write a story about Elara, the weaver of tapestries in future Technopolis. In the bustling city, a group of ",
    max_length=300,
    temperature=0.01,
    min_p=0.1,
    slop_phrase_prob_adjustments={"sddsadsadwedqdqw": 0.5},
    adjustment_strength=102.0,
    streaming=True
):
    print(tokenizer.decode(token), end='', flush=True)

Expected behavior: same output always. Actual behavior: different output from run to run.

sam-paech commented 2 weeks ago

Hi Mihai, thanks for the issue report -- I'm currently reworking how the logit downregulation & sampling works. There were some underlying conceptual flaws, but it's a lot more solid now.

I'll look into this issue shortly; I agree setting 0.01 should result in deterministic behaviour.