turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.53k stars 274 forks source link

Add LLM-FTC-sampling #276

Closed catid closed 8 months ago

catid commented 8 months ago

Implements the idea here: http://antirez.com/news/142

catid commented 8 months ago

Seems to improve the ability for Mixtral model to reason on my TextWorld benchmark by a statistically significant margin: https://x.com/MrCatid/status/1746308698887147626?s=20

turboderp commented 8 months ago

I'm sorry, but is this not pretty much identical to min-P?

catid commented 8 months ago

Yes I looked into it: https://github.com/huggingface/transformers/issues/27670 Seems to be the same idea.