noamgat / lm-format-enforcer

Enforce the output format (JSON Schema, Regex etc) of a language model
MIT License
1.01k stars 46 forks source link

Compatibility with ExLlamaV2 #21

Closed accountForIssues closed 7 months ago

accountForIssues commented 8 months ago

Thanks so much for sharing your project. I could get it working easily with transformers.

I have a few questions.

  1. Now that your PR has been merged with vLLM (version not yet released), does the example still work ? Do we still need to patch the functions ? I got an error, "Too many values to unpack" during the unpacking penalties step. I followed the notebook to the word.

  2. Can support be added for ExLlamaV2 ? The library is extremely high performing and being able to use 70b quants at 7 to 13b speeds would be amazing.

Let me know if I should create separate issues. Thanks again.

noamgat commented 8 months ago

Regarding vLLM, I just ran the sample notebook and it worked correctly. I would double check your integration. We still need to patch the functions, until the next vLLM release. When it will be ready, I will add an official vllm integration to the "integrations" module and simplify the sample notebook accordingly.

Regarding ExLlamaV2 - from a brief glance into the API it looks possible, as the sampler settings has the filters array, which would allow to set up a filter similar to how the huggingface integration works. However I don't currently have time to develop it, and the API does not seem that well documented, so I won't get to it in the coming days. PRs welcome :)

Otherwise, lets dedicate this post to ExLlamaV2, and see how many votes it gets.

jpeig commented 7 months ago

Would also vote to include ExLlamaV2. Its basically a 5x speed improvement - supercharging json output.

jpeig commented 7 months ago

Instead of ExLlamaV2, it may be an even better idea to provide support for https://github.com/huggingface/text-generation-inference.

It's similar to vLLM and includes support for ExLlamaV2, including many other features.

noamgat commented 7 months ago

Small update: lm-format-enforcer 0.6.5 with an official integration layer, now that vLLM released a version with the logits processor API. The vLLM sample notebook has been updated, it is much simpler and more powerful now (includes regex and analysis examples).

jpeig commented 7 months ago

@noamgat

Quite some significant changes - amazing. However, I created my own fork of the vLLM API server and hardcoded your custom sampler. Would you recommend switching it with the logits processor implementation? Would there be significant speed or accuracy gains?

noamgat commented 7 months ago

I would switch to the mainline version so you don't have a fork to maintain... Performance / quality wise they should be equivalent

On Sun, Nov 19, 2023, 17:13 Jorrit Velzeboer @.***> wrote:

@noamgat https://github.com/noamgat

Quite some significant changes - amazing. However, I created my own fork of the vLLM API server and hardcoded your custom sampler. Would you recommend switching it with the logits processor implementation? Would there be significant speed or accuracy gains?

— Reply to this email directly, view it on GitHub https://github.com/noamgat/lm-format-enforcer/issues/21#issuecomment-1817883913, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKFA2AYXKRRARENIENUYQLYFIO2BAVCNFSM6AAAAAA7GAT632VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJXHA4DGOJRGM . You are receiving this because you were mentioned.Message ID: @.***>

noamgat commented 7 months ago

As of v0.7.1, ExLlamaV2 is supported!

If you want to request support for text-generation-inference, please create a new issue, and also comment and thumbsup on this ticket so we can get the needed API support there: https://github.com/huggingface/text-generation-inference/issues/1269