noamgat / lm-format-enforcer

Enforce the output format (JSON Schema, Regex etc) of a language model
MIT License
1.42k stars 65 forks source link

Is there a way to enforce a LLM to generate a JSON without excessive whitespaces? #52

Closed LorrinWWW closed 5 months ago

LorrinWWW commented 9 months ago

Can we enforce a LLM to generate a more structured JSON format? Currently, it might produce excessive whitespaces and occasionally ends with an EOS token without even closing the bracket. (Actually, I found this is actually due to an issue in my schema.. But I am still curious about how to further constrain the model) Can we limit the flexibility of the LLM to ensure it only generates valid JSON objects, without excessive whitespaces? Thank you so much!

noamgat commented 9 months ago

Its currently not exposed via an API, but if you fork your own version, and change MAX_CONSECUTIVE_WHITESPACES = 12 to something else (0 probably) in consts.py, you should get the behavior you are looking for. I'm leaving this issue open and flagging it as an enhancement, to see if others are interested in it.

JoshC8C7 commented 9 months ago

+1 for the conf.py params being exposed ! can make a PR if I get a second.

QuangBK commented 8 months ago

+1

kongjiellx commented 8 months ago

+1

fergusbarratt commented 8 months ago

+1

davidsyoung commented 7 months ago

+1

juanhuguet commented 5 months ago

+1

noamgat commented 5 months ago

Released in v0.10.1. Can you check if you can now solve the problem via Configuration Options?

noamgat commented 5 months ago

Should be supported now, please reopen if the issue is not solved.