Multiple improvements: language detection per segment, VAD min duration on/off, unique speakers, pyproject.toml and more.

cvl01 commented 1 month ago

WhisperX repo with multiple improvements combined:

Silero VAD added from #888
Diarization improvements from #590
Unique speakers added to result (inspiration from #126)
Option to detect language per segment, very useful for longer audio with frequent language switches.
Changed setup.py to pyproject.toml
Added VAD min duration on and off parameters to PyAnnote. The current implementation splits even on sub-second pauses which is rather ineffective sometimes.
Pyannote.audio bumped to 3.3.2

Feel free to check out my repo and suggest improvements.

cvl01 commented 1 month ago

I agree with you in the sense that normally you'd open a pull request for one feature at once. Since this repo is unsupported I have created my own fork with all the changes that are useful for my personal whisperx usage. I did not create this with the intention of a pull request, more as a fully working, up to date whisperx package to be used in various projects. The reason I added the pull request here is for others to see the changes, and check out my repository if they want to see how it's implemented. If you insist, I can close the pull request. But since this repo is unmaintained and no new changes are merged in for a long time now, I don't think the extra effort of splitting into multiple PR's with one change/feature per PR is worth the effort.

federicotorrielli commented 1 month ago

Hi @cvl01, can you create PRs for my project? I plan to support WhisperX with the help of the community.

https://github.com/federicotorrielli/BetterWhisperX

m-bain / whisperX

Multiple improvements: language detection per segment, VAD min duration on/off, unique speakers, pyproject.toml and more. #900