m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.66k stars 1.34k forks source link

Multiple improvements: language detection per segment, VAD min duration on/off, unique speakers, pyproject.toml and more. #900

Open cvl01 opened 1 month ago

cvl01 commented 1 month ago

WhisperX repo with multiple improvements combined:

Feel free to check out my repo and suggest improvements.

cvl01 commented 1 month ago

I agree with you in the sense that normally you'd open a pull request for one feature at once. Since this repo is unsupported I have created my own fork with all the changes that are useful for my personal whisperx usage. I did not create this with the intention of a pull request, more as a fully working, up to date whisperx package to be used in various projects. The reason I added the pull request here is for others to see the changes, and check out my repository if they want to see how it's implemented. If you insist, I can close the pull request. But since this repo is unmaintained and no new changes are merged in for a long time now, I don't think the extra effort of splitting into multiple PR's with one change/feature per PR is worth the effort.

federicotorrielli commented 1 month ago

Hi @cvl01, can you create PRs for my project? I plan to support WhisperX with the help of the community.

https://github.com/federicotorrielli/BetterWhisperX