noamgat / lm-format-enforcer

Enforce the output format (JSON Schema, Regex etc) of a language model
MIT License
1.42k stars 65 forks source link

How does the function successfully deal with batch inputs #39

Closed pfZhu closed 9 months ago

pfZhu commented 9 months ago

As I understand it, when I call the build_transformers_prefix_allowed_tokens_fn function, a TokenEnforcer instance is initialized to maintain the parsing state of a sequence: 70dc65f001807e3f21cc7007268cff5e

Besides, in the huggingface transformers codes, this function is used to deal with multiple sequences in one batch, as in the following picture: 9741bff82e64d04149688814777f169b

My question is that, does it mean that one TokenEnforcer instance can deal with multiple sequences in a batch, and maintain all the parsing state of the sequences? Will the sequences in one batch affect or mess up each other's parsing states?