piotrkawa / deepfake-whisper-features

Implementation of the paper "Improved DeepFake Detection Using Whisper Features"
MIT License
72 stars 4 forks source link

How to extract intermediate features of audio by whisper? #16

Open jcl-gx-02 opened 1 month ago

jcl-gx-02 commented 1 month ago

I am very interested in your work. But I don’t understand how to extract the intermediate features of the audio through whisper and then use it as input to the back-end network?

piotrkawa commented 1 week ago

Hi, Whisper model which we use in this codebase is based on the original implementation however, for our purposes, we use only the Encoder part of the network (here).

Extracting the "Whisper features" is conducted in the corresponding architectures, e.g. here.

We first prepare the waveform (to ensure it is of the correct length), then convert it to mel-spectrogram and later use it as input to the Whisper's encoder. The output of this part is our front-end which can be later concatenated with other front-ends like MFCC or LFCC.

Piotr