noamgat / lm-format-enforcer

Enforce the output format (JSON Schema, Regex etc) of a language model
MIT License
1.51k stars 67 forks source link

[bug] CharacterLevelParserConfig gets ignored by tokenenforcer #131

Open laurens-gs opened 2 months ago

laurens-gs commented 2 months ago

The following setup in the TokenEnforcer initialization cause any custom CharacterLevelParserConfigs to be overriden with default values

https://github.com/noamgat/lm-format-enforcer/blob/fe6cbf107218839624e3ab39b47115bf7f64dd6e/lmformatenforcer/tokenenforcer.py#L55-L56

I think the approach should be to directly modify the alphabet attribute of the existing config to keep other existing configurations such as max array length.

noamgat commented 2 months ago

Thanks for raising this. PRs welcome, if not, I will try to get to it in a few days.