Open vilsonrodrigues opened 2 months ago
also reached the same conclusion...
half related - have you found out how to get speakerID? important in reatime conv over speaker, to ignore ai's inputs and also the best way to quickly trigger user-interruption to stop the current AI playback.
also reached the same conclusion...
half related - have you found out how to get speakerID? important in reatime conv over speaker, to ignore ai's inputs and also the best way to quickly trigger user-interruption to stop the current AI playback.
VAD + speech embed model + cosine similarity
Hello Mustafa
you can add **kwargs in transcribe?
This would allow access to extra param in "transcribe_audio" as temperature, no_speech_threshold, etc
https://github.com/mustafaaljadery/lightning-whisper-mlx/blob/main/lightning_whisper_mlx/lightning.py#L90