Closed jbrough closed 1 year ago
A small change to whisper.cpp is also needed to make the set_mel api usable:
Does this depend on the linked PR to whisper.cpp, or can it be merged by itself?
Can be merged by itself as just fixes the n_len calculation. the change to whisper.cpp is needed to make set_mel persistent, but that's the situation now too.
Sounds good
The required change was finally merged into Whisper.cpp:
https://github.com/ggerganov/whisper.cpp/pull/1130#event-10202162782
So if anyone needs to use the set_mel
api in this library, good to go.
See #85 if you need to use it now as well. tl;dr: override your whisper-rs dependency to this repo, branch whisper-8e46ba8
. I'll update here once upstream whisper.cpp has a release and I'll do a release of whisper-rs as well.
Based on how whisper.cpp stores the mel spectrogram internally, this is what we need here: mel.n_len is the stride in the time domain. I've successfully passed the JFK test by passing in a spectrogram alone, but this change is required to make it work.
more info in my implementation:
https://github.com/wavey-ai/mel_spec/blob/3373c64d9e3200485d96c3730b5e6430be074583/src/lib.rs#L141C4-L141C4 https://github.com/wavey-ai/mel_spec/blob/main/src/lib.rs#L475
I'm happy to move some/all of this code into whisper-rs at some point.
set_mel
can't really be used without something like this - it took a while to figure out - but not sure if the stft stuff is a bit out of scope, or not.