pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.46k stars 641 forks source link

Pre-emphasis and its variants? #2323

Closed underdogliu closed 1 year ago

underdogliu commented 2 years ago

🚀 The feature

I would like to have or add (by myself, maybe?) pre-emphasis filtering into the audio processing step.

Motivation, pitch

As we all know, pre-emphasis boosts the amount of energy in the high frequencies, especially for voiced segments. At least for speaker verification tasks (and I believe as well as others), it is thus beneficial.

Alternatives

For furthering, there are actually some linear/nonlinear filtering/normalization operations can be integrated, most of which can be sourced from other audio toolkits like librosa. But I think we may focus on pre-emphasis in torchaudio.transforms and torchaudio.functional first.

Additional context

No response

carolineechen commented 2 years ago

Hi @underdogliu, thanks for the suggestion! We don't have objections against supporting pre-emphasis, but were wondering if you could elaborate a bit more on what you're referring to for the variants, and if there's any existing implementation/paper/references you can link regarding this?

underdogliu commented 2 years ago

@carolineechen Sorry for the late reply. Been bothered with many things in parallel.

So pre-emphasis is nothing but a time-domain FIR filter. By talking variants I mean there might be some other types of filter available in order to flatten the spectrum. But of course, we can just apply a minimal version. But you make the final decision.

One reference: https://mini.dcs.shef.ac.uk/wp-content/papercite-data/pdf/loweimi_nolisp13.pdf

carolineechen commented 2 years ago

@underdogliu got it, yea I think adding standard pre-emphasis to torchaudio transforms and functional (under filtering) could be a good starting point! Is this something you're interested in working on?

also quick question, would we need to add a corresponding de-emphasis function for this to be useful, or is that not necessary or already handled by torchaudio's deemph_biquad function?

underdogliu commented 2 years ago

Yeah if necessary I am happy to spend some time developing it while getting myself more familiar with how torchaudio works. Of course, such a first-order FIR filter at the time domain can be regarded as a special case (b_0=1, a_0=1, b_1=-alpha, other parameters are zero-valued) of the bi-quad function.

Speaking of that function, I also have a question that may be naive: when I was checking this function, I found most of the simple computations are done via math instead of torch. Is it because we are handling scalars? I am not sure about that especially when we wanna make certain parameters learnable (analogous to PCEN and learnable STFT).

faroit commented 2 years ago

@underdogliu a good start might be adopting https://github.com/csteinmetz1/auraloss/blob/main/auraloss/perceptual.py#L39

stonelazy commented 1 year ago

I hope we would be able to implement the pre-emphasis filtering with torchaudio.functional.lfilter. Can somebody pls comment on this ?

carolineechen commented 1 year ago

addressed in #2871