Open vilsonrodrigues opened 7 months ago
also reached the same conclusion...
half related - have you found out how to get speakerID? important in reatime conv over speaker, to ignore ai's inputs and also the best way to quickly trigger user-interruption to stop the current AI playback.
also reached the same conclusion...
half related - have you found out how to get speakerID? important in reatime conv over speaker, to ignore ai's inputs and also the best way to quickly trigger user-interruption to stop the current AI playback.
VAD + speech embed model + cosine similarity
VAD + speech embed model + cosine similarity
Have you managed to do this in realtime? How instant is it? Hope it's lightning fast Got any repos or resources to share to help me and others? Thanks @vilsonrodrigues
Hello Mustafa
you can add **kwargs in transcribe?
This would allow access to extra param in "transcribe_audio" as temperature, no_speech_threshold, etc
https://github.com/mustafaaljadery/lightning-whisper-mlx/blob/main/lightning_whisper_mlx/lightning.py#L90