rustformers / llm

[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models
https://docs.rs/llm/latest/llm/
Apache License 2.0
6.06k stars 353 forks source link

Constrained generation support #235

Open philpax opened 1 year ago

philpax commented 1 year ago

This is an open-ended issue; I expect there will be more than one solution to this.

There have been a couple of solutions for constraining the output of generations:

The idea's pretty simple: the user supplies some kind of schema, and then generation is forced to match that schema by only sampling/feeding the tokens that fit that schema. jsonformer is a good place to look for this: it will feed in the JSON structure up to the point where the LLM should generate something, and then samples only the tokens that would be valid in that context.

Given that there are many potential ways to solve this problem and potential output formats, I'm not sure we should bake in one particular solution. My feeling is that we should offer additional crates for this kind of work, but not bake it into llm specifically.

An example might be a llm-json crate, which extends InferenceSession with a trait that takes any serde-able type and produces structured output:

#[derive(Serialize, Deserialize)]
struct Steps {
    steps: Vec<String>,
}

let steps = session.infer_json::<Steps>(/* ... */, format!("The following paragraph describes a process.\n{{paragraph}}\nPlease transcode it to JSON using the following schema: [[SCHEMA]]"))?;

dbg!(steps.steps);

This could also potentially live in llm-chain (and might be better suited to there), but I'm not sure if their abstraction allows for controlled sampling like this. Would need to chat to them.

philpax commented 1 year ago

It should be possible to implement this on top of our existing sampler abstraction. For the JSON case, we set up a JSON parsing state machine and only sample tokens that would be valid to sample from the current parser state.

In the long run I would like to have a port of Guidance or similar as it's much more general, but I'm not sure how much work would be involved there.

michael-dm commented 1 year ago

Hello @philpax, have you seen https://github.com/ggerganov/llama.cpp/pull/1773 on this topic ?

philpax commented 1 year ago

Hi there! Yes, I have - it's quite impressive, but it's quite specific to llama.cpp's needs. With #359 landing soon, we'll have a modular sampling solution where these kinds of constraints can hopefully be defined in a reusable fashion.

Reichenbachian commented 10 months ago

Has there been any recent movement on this? I'm hoping to do some constrained JSON generation using this crate. Is this the right package to push for this or should I be looking at the llm-samplers crate?

philpax commented 10 months ago

Hi there! Unfortunately, it's not a priority; our current focus is on catching up to llama.cpp and the rest of the ecosystem. You may be able to implement this yourself; @KerfuffleV2 may also have some ideas as to how to implement this with llm-samplers.

KerfuffleV2 commented 10 months ago

This might help you: https://github.com/KerfuffleV2/llm-samplers/pull/7#issuecomment-1783980172 (See the third item.)

Note that I didn't really look at it closely so I can't explain it or anything. I do hope to have something like that in llm-samplers eventually but it doesn't currently exist. One thing that kind of has to happen first is a resource system overhaul.

If you want to try to implement it yourself (as a standalone thing or a Sampler in llm-samplers) probably the simplest way is to have some kind of parser and then just ban every token that doesn't match the parser's current state. I believe this is basically how llama.cpp's grammar sampler works also. Then once you've banned everything that doesn't conform to the grammar, you can let the normal samplers run.