richardanaya / gbnf

A library for working with GBNF files
MIT License
19 stars 0 forks source link

Implement efficient optional repetitions #1

Open ShelbyJenkins opened 3 months ago

ShelbyJenkins commented 3 months ago

First of all, great project and very useful. It has a lot of promise and probably should be integrated with mistral-rs and other rust llm projects.

I noticed that it doesn't follow llama.cpp's method for nesting optional repetitions: https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md

Efficient optional repetitions

A common pattern is to allow repetitions of a pattern x up to N times.

While semantically correct, the syntax x? x? x?.... x? (with N repetitions) will result in extremely slow inference. Instead, you can write (x (x (x ... (x)?...)?)?)? (w/ N-deep nesting)

For example this will generate:

item item item item? item? item?

fn build_item_frequency(min: u16, max: u16) -> Vec<ProductionItem> {
    let mut items = vec![];
    let optional_count = cmp::max(max, min) - min;

    for _ in 0..min {
        items.push(ProductionItem::NonTerminal(
            NonTerminalSymbol {
                name: "item".to_string(),
            },
            RepetitionType::One,
        ))
    }
    for _ in 0..optional_count {
        items.push(ProductionItem::NonTerminal(
            NonTerminalSymbol {
                name: "item".to_string(),
            },
            RepetitionType::ZeroOrOne,
        ))
    }
    items
}

Where as this will generate:

item item item (item (item (item)?)?)?

fn build_item_frequency(min: u16, max: u16) -> String {
    let mut item_rule = String::from("");
    let optional_count = cmp::max(max, min) - min;

    for _ in 0..min {
        item_rule.push_str("item");
    }
    for _ in 0..optional_count {
        item_rule.push_str("(item");
    }
    for _ in 0..optional_count {
        item_rule.push_str(")?");
    }

    item_rule
}

I've tested them both out, and the former is very slowly, and in some cases won't run complete at all. While the later doesn't seem to slow down at all. I don't know the status of this project, but it seems like it would be simple enough to implement. If you're not interested in handling it, I could take a look at it. But I figure handling this type of behavior might be more complex than it sounds and require fully understanding the engine you've built, and I'm a bit busy rn!

Anyways, looking forward to continuing using this project!

richardanaya commented 3 months ago

thanks for the comments! I haven't taken a look at this project in a bit, but I can think about some of the things you asked and/or open to PR request!