kyungyunlee / ismir2018-revisiting-svd

Revisiting Singing Voice Detection : a Quantitative Review and the Future Outlook
65 stars 9 forks source link

Doubt: Double Stage HPSS calculated over first P component #2

Open Vichoko opened 4 years ago

Vichoko commented 4 years ago

I've been thinking a lot about this code fragment in https://github.com/kyungyunlee/ismir2018-revisiting-svd/blob/master/leglaive_lstm/audio_processor.py in function process_single_audio (Compute double stage HPSS for the given audio file) in lines 24-33:

    audio_src, _ = librosa.load(audio_file, sr=SR)
    # Normalize audio signal
    audio_src = librosa.util.normalize(audio_src)
    # first HPSS
    D_harmonic, D_percussive = ono_hpss(audio_src, N_FFT1, N_HOP1)
    # second HPSS
    D2_harmonic, D2_percussive = ono_hpss(D_percussive, N_FFT2, N_HOP2)

    assert D2_harmonic.shape == D2_percussive.shape
    print(D2_harmonic.shape, D2_percussive.shape)

The D2_harmonic and D2_percussive are calculated from the D_percussive component.

Is this right? I'm currently checking the original paper and i will keep you updated if i discover something.

This seems kinda odd, since my intuition says that the harmonic component has more importance to voice activity detection.

kyungyunlee commented 4 years ago

Hi, if you check paper from Leglaive et al. which is referenced in readme.md, you will see that they mention that singing voice is considered to be in between percussive and harmonic components. I think it makes sense since pronouncing consonants, for instance, are more percussive than harmonic. Also, singing style varies so it is hard to assume that double harmonic components will capture all the singing styles. But if you are not reproducing the paper, you are always free to experiment with different methods. :)