victor-upmeet / whisperx-replicate

22 stars 29 forks source link

Implement whisper-timestamped #10

Open villesau opened 2 months ago

villesau commented 2 months ago

Hi, https://github.com/linto-ai/whisper-timestamped seems like an interesting approach for accurate timestamps, and apparently would not have problems with numerics and so on. Would it be a big effort to implement a replicate endpoint for that too?

Huanshere commented 1 month ago

I fully tested and compared these two methods in my project VideoLingo, I gotta say the timestamp of whisperX is way more stable than whisper-timestamped, it can addresses Whisper's inherent hallucination issue through forced alignment.

villesau commented 1 month ago

Yep I noticed the same in the end, whisper-timestamped was very far from accurate timestamps. https://github.com/jianfch/stable-ts seems better than that at least. Didn't test against WhisperX yet, but it does not suffer from the numerics problem that WhisperX suffers from, and is way better than whisper-timestamped.

Huanshere commented 1 month ago

Thanks for sharing, stable-ts looks so gooood and it deserves 100k stars! It shows how important to name your project in a SEO friendly way ahaha. I'll test it out right away.

villesau commented 1 month ago

Yep it definitely wasn't the first option I found either :) I found it very randomly actually.

Huanshere commented 1 month ago

Tested, just so perfect, I can't ask for more... What surprised me is it doesn't need a wav2vac model specific for a single language to perform the force alignment, which makes it super fast and super lite. I will definately replace whisperX with stable-ts in my project ahaha. But unfortunately stable-ts on replicate is not up-to-date, I may need to pack one myself. Thanks again for sharing this 👍