janhq / ichigo

Local realtime voice AI
Apache License 2.0
1.94k stars 92 forks source link

planning: Ichigo VAD #91

Open dan-homebrew opened 1 month ago

dan-homebrew commented 1 month ago

Goal

Tasklist

Resources

hahuyhoang411 commented 1 month ago

e2e vad: https://github.com/modelscope/FunASR

PodsAreAllYouNeed commented 3 weeks ago

FunASR is used by huggingface to support the Paraformer STT model, while they use SileroVAD. The FSMN-VAD provided by FunASR could be useful to look into as well. Also the pipeline for FunASR includes VAD and Diarization together with STT which could indeed be very useful.

The VAD handler written by hf using some of the SileroVAD code is quite nice: https://github.com/huggingface/speech-to-speech/blob/93d74ba3bc3ad1a948cc167d7cdb95699e49d867/VAD/vad_handler.py

It includes enhancement as well, which is very useful. We can potentially adapt the handler to support other VADs as well. This can cater to #93 as well.

Current Pipeline Audio -> Ichigo -> TTS

Pipeline using hf/s2s handler Audio -> (VAD -> Enhancement) -> Ichigo -> TTS

tikikun commented 1 week ago

great @nguyenhoangthuan99 you can take over this if you continue on ichigo demo