Open ajinkyakulkarni14 opened 11 months ago
Hi, the original Whisper architecture works on audio chunks of 30 seconds - inputs shorter than that are padded with zeros. In this implementation we repeat the audio signal instead.
Unfortunately, I did not research changing Whisper's input length, so I cannot comment on that. However, recently, there have been many improvements over Whisper architectures, so some of the newer implementations may address this issue.
Best regards, Piotr
Hi
I can see that Whisper mesonet recipe is only working on 30sec audio. I can see it is due to Whisper feat extraction process. Can you comment on how to change it for variable length audio segment?
Regards Ajinkya Kulkarni