Open kurtbuilds opened 8 months ago
👋 I wrote a implementation of constrained sampling with candle here that might be useful as a reference. Here are a few things I found important:
@lucasavila00 It would be great if you could implemented your model grammar work via BNF into Candle
llama.cpp now supports grammars:
https://til.simonwillison.net/llms/llama-cpp-python-grammars
Is that something that will come to candle?
It sounds like the approach taken in this python library would be straight forward:
https://github.com/1rgs/jsonformer/blob/main/jsonformer/main.py
Basically, since you know the JSON schema, you return appropriate LLM tokens for structure based on control flow, and constrain logit output for typed value situations.
I started to work on this approach in a demo codebase... I'll report back on any progress.
Curious to hear from others about how feasible the approach is.