Hi, I used mp3 to reduce the size of the dataset and overcome the slowdown from I/O.
You can use wav as well, in that case, there is no need to decode the file here
Keep in mind if you change the samping rate, you need to adjust the STFT settings here to match.
I have a problem. why convert .wav to .mp3 and 32k? And what would happen if converting to 16K and use .wav file?