tazz4843 / whisper-rs

Rust bindings to https://github.com/ggerganov/whisper.cpp
The Unlicense
607 stars 105 forks source link

Fix set_grammar to set pointer to array of pointers #145

Open jcsoo opened 1 month ago

jcsoo commented 1 month ago

The signature for set_grammar is incorrect - it currently takes Option<Vec<WhisperGrammarElement>>, but whisper_full_params requires

const whisper_grammar_element ** grammar_rules;

i.e. a pointer to an array of grammar rules.

I updated the signature to take Option<Vec<Vec<WhisperGrammarElement>>>, which is then transformed into Vec<Box<[whisper_grammar_element]>>. I then added a member to store the pointers to the individual grammar element slices, and store the pointer to that member in grammar_rules.

I think this should be OK if the self.grammar and self.grammar_pointers are never modified without modifying grammar_rules and n_grammar_rules at the same time. The underlying storage should be freed when FullParams is dropped.

I don't know that this is the best function signature since it requires cloning the grammar, so I'm open to suggestions.

I tested this out and it does appear to work, though I don't know how to figure out if a rule has been fully matched.

jcsoo commented 1 month ago

I've updated set_grammar to take a single Box<[u64]> with the following contents:

0: # of rules 1..N+1: index of rule N N+2..: grammar elements

This allows us to pass around a single self-contained item that doesn't have any self-referential pointers. It's also simple enough so that we don't need dependencies between whisper-rs and the grammar parser, and it can also be used in llama.cpp derivatives.