pigeonai-org / ViDove

🐦ViDove: RAG-Augmented End-to-end Multimodal Translation Agent
GNU General Public License v3.0
93 stars 9 forks source link

New Module: audio segmentation module #11

Open yichen14 opened 8 months ago

yichen14 commented 8 months ago

implement VAD first refer to Stable-Whisper and Whisper-X: https://github.com/jianfch/stable-ts https://github.com/m-bain/whisperX https://github.com/snakers4/silero-vad https://arxiv.org/abs/2303.00747