hayabhay / frogbase

Transform audio-visual content into navigable knowledge.
https://frogbase.dev
MIT License
781 stars 95 forks source link

(Feature request) Voice Activity Detector #31

Open NivaucchuRabuessa opened 1 year ago

NivaucchuRabuessa commented 1 year ago

Hello! I'm coming from your post on r/MachineLearning. Japanese transcriptions are more accurate with a VAD and that's the only reason I keep using some very simple WebUI. Do you have any plan to integrate a detector?

Links for reference: VAD: https://github.com/snakers4/silero-vad WebUI I'm currently using: https://github.com/openai/whisper/discussions/397

hayabhay commented 1 year ago

Will dig into this for the next update! Thanks!

hayabhay commented 1 year ago

Will integrate with PyAnnote. Bumping this.