r2d4 / rellm

Exact structure out of any language model completion.
MIT License
501 stars 23 forks source link

Skip generating tokens if the regex is constant #8

Closed jerome3o closed 1 year ago

jerome3o commented 1 year ago

I've been tinkering with converting JSON Schema to CFGS to use with parserllm to ensure my LLM only generates valid JSON for a given schema (similar to Jsonformer, except I found it easier to add more of the json schema features this way) with a view to making really good agents with open source models.

I've found that a lot of the CFG is big constant strings (the json object keys) that don't need to be generated by the LLM. This PR implements a little check to see if a regex is constant (by looking for the presence of un-escaped special characters \.*+?{}()[]|^$) and if the regex is constant, just get rellm to return the matching string.

This speeds things up significantly in my usecase