specify fmin and fmax for Spectrogram

pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch

https://pytorch.org/audio

BSD 2-Clause "Simplified" License

2.43k stars 636 forks source link

specify fmin and fmax for Spectrogram #3732

Open bilzard opened 5 months ago

bilzard commented 5 months ago

🚀 The feature

specify fmin and fmax for Spectrogram like MelSpectrogram.

Motivation, pitch

We can specify fmin and fmax for MelSpectrogram, but we cannot for Spectrogram. If we don't want to use frequencies out of specified frequency bands, it will spend extra memory and computation costs. Also, by this feature, we can make it consistent specifications for Spectrogram and MelSpectrogram transforms.

Alternatives

I don't know the current workaround for fulfilling:

specify fmin and fmax
extract linear filter banks

Additional context

No response

bilzard commented 5 months ago

I have misunderstanding on current implementation of MelSpectrogram. It is just combination of Spectrogram and MelScale transforms[1]. So, current implementation of MelSpectrogram's computational cost is just the same as Spectrogram.

Nevertheless, I still interested in if there are possibility for directly specifying fmin and fmax in Spectrogram transform. In my understanding, it is technically possible and it will reduce computation and memory cost in cases I mentioned above.

[1] https://pytorch.org/audio/main/generated/torchaudio.transforms.MelSpectrogram.html#torchaudio.transforms.MelSpectrogram

bilzard commented 5 months ago

I found a workaround for fmin=0 Hz.

We can simply down-sample the original sequence until it come to limit for the Nyquist frequency that corresponds with the new sampling rate. E.g., If we only want 0-20 Hz frequency band, and the original sampling frequency is 200 Hz, we can down sample original sequence for 40 Hz (1/5) and pass it to STFT.

I still be issue for fmin>0 Hz, but in my case (fmin=0 Hz), the issue is solved.