ExllamaV2 optimizations

Currently, building the initial token tree is inefficient and can cause slow ingestion of tokens (for example, a JSON schema). This is evident when using models with large vocab sizes such as cohere command-r, gemma, and qwen. Generation locks up and takes hours to process. These commits help optimize that initial building when creating an ExllamaV2 LMFE filter.

Tests: Run command-r with a JSON schema in TabbyAPI using LMFE v0.9.5, would not start generating. With these commits, generation immediately starts.

References #75

Thanks @turboderp for creating these commits.

noamgat / lm-format-enforcer

ExllamaV2 optimizations #88