Better speaking rate estimation with long silence removal

huggingface / dataspeech

MIT License

310 stars 47 forks source link

Better speaking rate estimation with long silence removal #6

Closed gizemt closed 4 months ago

gizemt commented 7 months ago

Is anyone working on “Better speaking rate estimation with long silence removal”?

I’m playing with a potential solution using Brouhaha-vad (same as here) and then computing speech rate as #phonemes / speech duration. If that approach works for you, I can test the code and create a PR. If there’s another approach in mind, I’d be happy to help with that as well.

ylacombe commented 7 months ago

Hey @gizemt, as far as I know, nobody is working on this yet! thanks for opening this issue ! This approach seems the most reasonable to me, feel free to test it and open a PR! also, it'd be great to have examples on how to make this work and how efficient it is, if you can!

BTW, do you think VAD with Brouhaha is more efficient than using other VAD libraries (such as pyannote)?

ylacombe commented 4 months ago

Done in #24 !