Open thewh1teagle opened 4 days ago
If you are using CPU, it won't make much difference in speed.
If you are using CPU, it won't make much difference in speed.
If we process speaker sentences of 5 seconds each time it will process it as 30 seconds, no? Also GPU is very important with whisper because it's heavy model and that makes much difference
If we process speaker sentences of 5 seconds each time it will process it as 30 seconds, no?
I suggest that you have a look at the Moonshine models. It does not require padding.
I suggest that you have a look at the Moonshine models. It does not require padding.
unfortunately it supports only English
Whisper model has limitation of 30s. Can you integrate batch inference into sherpa? I would like to use it along with the diarization.
I'm still not sure how exactly it possible to batch it but I have some idea: use silero-vad and aggregate segments into 30s (if there's smaller) add silence between. using word timestamps, estimate where's the silence added and reconstruct back the segments text.
https://github.com/thewh1teagle/loud.cpp/issues/11