support for json (or other?) grammar?

kurtbuilds commented 8 months ago

llama.cpp now supports grammars:

https://til.simonwillison.net/llms/llama-cpp-python-grammars

Is that something that will come to candle?

It sounds like the approach taken in this python library would be straight forward:

https://github.com/1rgs/jsonformer/blob/main/jsonformer/main.py

Basically, since you know the JSON schema, you return appropriate LLM tokens for structure based on control flow, and constrain logit output for typed value situations.

I started to work on this approach in a demo codebase... I'll report back on any progress.

Curious to hear from others about how feasible the approach is.

ealmloff commented 8 months ago

👋 I wrote a implementation of constrained sampling with candle here that might be useful as a reference. Here are a few things I found important:

Parsing must be incremental if you want to get reasonable speeds for longer sequences (This makes FSM a good choice)
You can accelerate text generation by eagerly sampling the grammar and feeding the required next tokens into the LLM in one batch instead of one token at a time

andrewlimmer commented 2 months ago

@lucasavila00 It would be great if you could implemented your model grammar work via BNF into Candle

huggingface / candle

support for json (or other?) grammar? #1945