xjdr-alt / entropix

Entropy Based Sampling and Parallel CoT Decoding
Apache License 2.0
2.64k stars 266 forks source link

Use JSON filler instead of PAUSE tokens #69

Open xonfour opened 6 days ago

xonfour commented 6 days ago

Thanks for the exciting approach! I've been playing around with evaluating entropy for a while now, but haven't yet considered valentropy. I'll have to change that. ;)

I'd like to point out a very powerful approach as a replacement for CoT or PAUSE tokens: using legal filler syntax in enforced JSON output such as spaces or newlines. In my experience, this works very well WITHOUT any fine-tuning.

JSON syntax would also offer further possibilities in this context, such as forcing intermediate "reasoning" steps.

I've documented the approach at https://www.reddit.com/r/LocalLLaMA/comments/1g0ukv4/from_instinct_to_insight_how_roar_transforms_ai/ and will see how I can incorporate Entropix (and then hopefully finally publish the code).

theblackcat102 commented 4 days ago

Do you have any benchmarks on this claim? I think adding JSON syntax might not be the best choice as "reasoning" rarely occur in JSON syntax and enforcing one on it would result in reasoning degradation. Here's a paper which study this extensively : https://arxiv.org/abs/2408.02442

xonfour commented 2 days ago

Unfortunately, I don't have any benchmarks, I don't have the resources for that at the moment. My code is complex and dirty and contains many more optimizations and feedback loops that I would have to remove first. So I'll just make claims and it's completely OK to ignore my statement and close this issue (there was no other option here).

What I can say is that I have what I think is a very powerful chatbot that started using padding tokens (pad, space, tab, newline) on its own. I'm attaching a simple but real example:

paddings

That already worked very well with Mixtral8x7, and I'm currently using Gemma2 9B. Not exactly very large models. The bot uses the "reasoning" keys to think about the interaction and its response, which works extremely well.

A few months ago I started calculating the entropy and in my experiments it decreased (mostly) with more padding tokens. Forcing additional padding didn't work well at the time, but I didn't pursue the approach any further. Until now.

For forcing my predefined JSON scheme, I use lm-format-enforcer. Thanks for the paper. I'm not surprised that the quality drops due to the enforced restrictions. It somehow feels right and was also my first experience. The additional "processing cycles" more than make up for it, though.

I realize that this is not a scientific approach. But should I keep quiet about it? No. ;)

akarshghale commented 9 hours ago

JSON actually degrades the output quality unless done with specialized training so I think it's better not to.