noamgat / lm-format-enforcer

Enforce the output format (JSON Schema, Regex etc) of a language model
MIT License
1.42k stars 65 forks source link

Supporting of streaming #49

Closed MarcusTXK closed 9 months ago

MarcusTXK commented 9 months ago

Apologies if this is a silly question. This library works really well in enforcing the output format, thanks alot for creating it! I am trying to make a home assistant, and the response is a json, with various fields, with the important one being the msg field, which will be read out to the user as the reply.

Streaming of the response would allow me to get the msg quicker, as it can be read out even while the rest of the json is being generated. Could I check if streaming of LLamaCPP is already supported or if it is planned to be supported anytime in the future?

For example, LLamaIndex provides the stream_complete endpoint to stream the response as it’s being generated rather than waiting for the entire response to be generated (docs here), which is lost when we pass it into LMFormatEnforcerPydanticProgram.

noamgat commented 9 months ago

Assuming you are using LlamaIndex, correct? If so, there are two ways to use LMFormatEnforcer - one is through LMFormatEnforcerPydanticProgram, the other is directly using activate_lm_enforcer route using existing LLamaCPP instances. See the regular expression example for more information, you can use it with JsonSchemaParser instead of RegexParser and it should work for stream_complete as well.

MarcusTXK commented 9 months ago

Okay, thanks alot for pointing me in the right direction!