Open dan-homebrew opened 1 week ago
e2e vad: https://github.com/modelscope/FunASR
FunASR is used by huggingface to support the Paraformer STT model, while they use SileroVAD. The FSMN-VAD provided by FunASR could be useful to look into as well. Also the pipeline for FunASR includes VAD and Diarization together with STT which could indeed be very useful.
The VAD handler written by hf using some of the SileroVAD code is quite nice: https://github.com/huggingface/speech-to-speech/blob/93d74ba3bc3ad1a948cc167d7cdb95699e49d867/VAD/vad_handler.py
It includes enhancement as well, which is very useful. We can potentially adapt the handler to support other VADs as well. This can cater to #93 as well.
Current Pipeline Audio -> Ichigo -> TTS
Pipeline using hf/s2s handler Audio -> (VAD -> Enhancement) -> Ichigo -> TTS
Goal
Tasklist
Resources