keunwoochoi / kapre

kapre: Keras Audio Preprocessors
MIT License
922 stars 146 forks source link

Problem with backend.magnitude_to_decibel #121

Open kenders2000 opened 3 years ago

kenders2000 commented 3 years ago

Hi,

I think there is an issue with backend.magnitude_to_decibel(). This calculates the db value as 10 * log10(x), where x is the magnitude (i.e. x = abs(stft)).

As I understand, the decibel value of the magnitude of a signal should be 20 * log10(x) not 10 * log10(x). The latter is the decibel value of the signal power, i.e. it expects: 10 * log10(x**2).

I was confused as it is passing unit tests against librosa, but I think I know why. In tests.test_backend.test_magnitude_to_decibel() magnitude_to_decibel() is being tested against librosa.power_to_db() which expects power as an input.

And in tests.test_time_frequency.test_melspectrogram_correctness() the reference melspectrogram is being computed with a power of 1, which I believe returns a magnitude melspectrogram not the power spectrum as is usual.

    # compute with librosa
    S_ref = librosa.feature.melspectrogram(
        src_mono,
        sr=sr,
        n_fft=n_fft,
        hop_length=hop_length,
        win_length=win_length,
        center=False,
        power=1.0,
        n_mels=n_mels,
        fmin=mel_f_min,
        fmax=mel_f_max,
    ).T

Basically at the moment backend.magnitude_to_decibel() is equivalent to librosa.core.spectrum.power_to_db() but I think it should be equivalent to librosa.core.spectrum.amplitude_to_db(), see librosa.core.spectrum.

Let me know if I have got the wrong end of the stick here! But if not I am happy to fix this. I just wanted to check that my thinking is correct. The result would be that all spectrograms would be scaled by a factor of two, so while trivial, all spectrogram converted to decibels would change in this way.

Kind regards

Paul

keunwoochoi commented 3 years ago

Hi, first of all, your understanding about the current implementation is correct. But I set it intentionally and named it magnitude_, instead of either amplitude_ or power_. So it's not quite a DSP thing, it's rather an implementation of what people often apply after abs(STFT).