Closed tqtifnypmb closed 7 months ago
This doesn't sound like a precision issue to me. Have you tried the same prompt / audio in the Python MLX Whisper example? If it works there then either:
If it doesn't work in the Python example, then we'll need to investigate further. (presumably you've tested this in the original code Whisper). In this case maybe you could provide the input and expected output and steps to reproduce.
After comparing output of MLX Whisper
and my implementation part by part, this issue was caused by precision losses outside MLX.
Hi
I'm using swift-mlx to implement the Whisper model. The encoder runs on CoreML, and the decoder runs on MLX. Everything works fine for the
tiny
,tiny.en
,base
andbase.en
,small.en
models. However, I encountered some strange issues with other models:1) For the
small
model, whenever I include a prompt, the decoder's output becomes abnormal. If I remove the prompt, the output becomes normal again. This issue only occurs with thesmall
model.2) Both the
small
andmedium
models have problems transcribing languages other than English.Since all other variables are the same, I wanted to ask if the size of the model could be causing precision issues in MLX calculations?
Thanks