outlines-dev / outlines

Structured Text Generation
https://outlines-dev.github.io/outlines/
Apache License 2.0
6.94k stars 357 forks source link

Exllamav2 integration #1009

Open isamu-isozaki opened 2 days ago

isamu-isozaki commented 2 days ago

Presentation of the new feature

Currently, for exllamav2, we have an outlines model that handles it. However, in the recent update to exllamav2, a dynamic generator was added which is not compatible with current outline's generate method as it does paged attention/radix cache etc. However, exllamav2 has a way to do constrained generation with this which is called filters. A demonstration of this with lm format enforcer is here. The idea of this feature is to do the same thing by moving outlines logic to filters like in the integration scripts for other codebases.

Where does it fit in Outlines?

For the current lm format enforcer+exllamav2, the constrained generation is not reliable/a bit buggy and I noticed using outlines does solve this issue.

Are you willing to open a PR?

I already made one.

Thanks for contributing!