noamgat / lm-format-enforcer

Enforce the output format (JSON Schema, Regex etc) of a language model
MIT License
994 stars 45 forks source link

ExllamaV2 optimizations #88

Closed bdashore3 closed 2 months ago

bdashore3 commented 2 months ago

Currently, building the initial token tree is inefficient and can cause slow ingestion of tokens (for example, a JSON schema). This is evident when using models with large vocab sizes such as cohere command-r, gemma, and qwen. Generation locks up and takes hours to process. These commits help optimize that initial building when creating an ExllamaV2 LMFE filter.

Tests: Run command-r with a JSON schema in TabbyAPI using LMFE v0.9.5, would not start generating. With these commits, generation immediately starts.

References #75

Thanks @turboderp for creating these commits.

noamgat commented 2 months ago

Merged, thanks @bdashore3 and @turboderp for the contribution!