Open dan-homebrew opened 1 month ago
e2e vad: https://github.com/modelscope/FunASR
FunASR is used by huggingface to support the Paraformer STT model, while they use SileroVAD. The FSMN-VAD provided by FunASR could be useful to look into as well. Also the pipeline for FunASR includes VAD and Diarization together with STT which could indeed be very useful.
The VAD handler written by hf using some of the SileroVAD code is quite nice: https://github.com/huggingface/speech-to-speech/blob/93d74ba3bc3ad1a948cc167d7cdb95699e49d867/VAD/vad_handler.py
It includes enhancement as well, which is very useful. We can potentially adapt the handler to support other VADs as well. This can cater to #93 as well.
Current Pipeline Audio -> Ichigo -> TTS
Pipeline using hf/s2s handler Audio -> (VAD -> Enhancement) -> Ichigo -> TTS
great @nguyenhoangthuan99 you can take over this if you continue on ichigo demo
Goal
Tasklist
Resources