mustafaaljadery / lightning-whisper-mlx

An extremely fast implementation of whisper optimized for Apple Silicon using MLX.
https://mustafaaljadery.github.io/lightning-whisper-mlx/
587 stars 30 forks source link

Enable kwargs in transcribe function #8

Open vilsonrodrigues opened 7 months ago

vilsonrodrigues commented 7 months ago

Hello Mustafa

you can add **kwargs in transcribe?

This would allow access to extra param in "transcribe_audio" as temperature, no_speech_threshold, etc

https://github.com/mustafaaljadery/lightning-whisper-mlx/blob/main/lightning_whisper_mlx/lightning.py#L90

fire17 commented 5 months ago

also reached the same conclusion...

half related - have you found out how to get speakerID? important in reatime conv over speaker, to ignore ai's inputs and also the best way to quickly trigger user-interruption to stop the current AI playback.

vilsonrodrigues commented 5 months ago

also reached the same conclusion...

half related - have you found out how to get speakerID? important in reatime conv over speaker, to ignore ai's inputs and also the best way to quickly trigger user-interruption to stop the current AI playback.

VAD + speech embed model + cosine similarity

fire17 commented 4 months ago

VAD + speech embed model + cosine similarity

Have you managed to do this in realtime? How instant is it? Hope it's lightning fast Got any repos or resources to share to help me and others? Thanks @vilsonrodrigues