Whisper mesoNet only working on 30sec audio

piotrkawa / deepfake-whisper-features

Implementation of the paper "Improved DeepFake Detection Using Whisper Features"

MIT License

91 stars 6 forks source link

Whisper mesoNet only working on 30sec audio #8

Open ajinkyakulkarni14 opened 11 months ago

ajinkyakulkarni14 commented 11 months ago

I can see that Whisper mesonet recipe is only working on 30sec audio. I can see it is due to Whisper feat extraction process. Can you comment on how to change it for variable length audio segment?

Regards Ajinkya Kulkarni

piotrkawa commented 11 months ago

Hi, the original Whisper architecture works on audio chunks of 30 seconds - inputs shorter than that are padded with zeros. In this implementation we repeat the audio signal instead.

Unfortunately, I did not research changing Whisper's input length, so I cannot comment on that. However, recently, there have been many improvements over Whisper architectures, so some of the newer implementations may address this issue.

Best regards, Piotr