microsoft / llguidance

Low-level Guidance Parser
MIT License
19 stars 3 forks source link

only allowing valid tokenizations in grammars #1

Open mmoskal opened 2 months ago

mmoskal commented 2 months ago

See https://vivien000.github.io/blog/journal/llm-decoding-with-regex-constraints.html and https://github.com/vivien000/regex-constrained-decoding/blob/main/technical_appendix.pdf

Thoughts (unorganized):

The tokens we most need to discard will be along the forced path, for example after " the , is forced. Note that if the grammar allows white space between " and ,, there is no forced paths and moreover the token " should be still allowed (unless there are tokens ", "\n, "\t etc covering all of white space; but I think this is very unlikely).

Transferred from https://github.com/hudson-ai/guidance/issues/13