tazz4843 / whisper-rs

Rust bindings to https://github.com/ggerganov/whisper.cpp
The Unlicense
659 stars 108 forks source link

calculate mel.n_len as mel spectrogram stride len #75

Closed jbrough closed 1 year ago

jbrough commented 1 year ago

Based on how whisper.cpp stores the mel spectrogram internally, this is what we need here: mel.n_len is the stride in the time domain. I've successfully passed the JFK test by passing in a spectrogram alone, but this change is required to make it work.

more info in my implementation:

https://github.com/wavey-ai/mel_spec/blob/3373c64d9e3200485d96c3730b5e6430be074583/src/lib.rs#L141C4-L141C4 https://github.com/wavey-ai/mel_spec/blob/main/src/lib.rs#L475

I'm happy to move some/all of this code into whisper-rs at some point. set_mel can't really be used without something like this - it took a while to figure out - but not sure if the stft stuff is a bit out of scope, or not.

jbrough commented 1 year ago

A small change to whisper.cpp is also needed to make the set_mel api usable:

https://github.com/ggerganov/whisper.cpp/pull/1130

tazz4843 commented 1 year ago

Does this depend on the linked PR to whisper.cpp, or can it be merged by itself?

jbrough commented 1 year ago

Can be merged by itself as just fixes the n_len calculation. the change to whisper.cpp is needed to make set_mel persistent, but that's the situation now too.

tazz4843 commented 1 year ago

Sounds good

jbrough commented 1 year ago

The required change was finally merged into Whisper.cpp:

https://github.com/ggerganov/whisper.cpp/pull/1130#event-10202162782

So if anyone needs to use the set_mel api in this library, good to go.

tazz4843 commented 1 year ago

See #85 if you need to use it now as well. tl;dr: override your whisper-rs dependency to this repo, branch whisper-8e46ba8. I'll update here once upstream whisper.cpp has a release and I'll do a release of whisper-rs as well.