keunwoochoi / torchaudio-contrib

A test bed for updates and new features | pytorch/audio
169 stars 22 forks source link

harmonic-percussive source separation #25

Open keunwoochoi opened 5 years ago

keunwoochoi commented 5 years ago

torch has median(). A median-filtering bass harmonic percussive source separation can be easily implemented. It's quite a bit MIR-only though.

faroit commented 5 years ago

I think we should stay away from classical dsp based methods, since those are to be replaced by deep learning based ones sooner or later anyway (https://arxiv.org/abs/1807.11298) Same for standard filters, etc.

personally, I'm against anything except STFT, except if its perceptually motivated (which mel do... partly). But convince me that we still need that! ;-)

keunwoochoi commented 5 years ago

No that's fair overall. But, I don't think that having deep learning based model out there should be a reason for us not to have something (for partly the same reason I'd like to have MFCC here) - because, well, then are we 'deploying' one of those pre-trained models? Are those things mature enough to be a part of? Would they be consistent (or even working) with newer torch versions? In short, I don't think we're there yet in general. Meanwhile, once-deployed things like HPSS/MFCC would work without too much burden.

But I'm also not 100% sure. And I'll need it anyway (am gonna implement it very now), so.. we can wait and see :)

faroit commented 5 years ago

Okay, maybe we can draw the line and implement it that order:

  1. method is (still) useful as an input feature transform: 👍 (e.g. MFCC, MEL)
  2. method (still) serves as a baseline, so people benefit from GPU implementations: 👍 / 👎 (e.g. HPSS, Yin...), nice to have, but not priority
  3. method is part of package x, so i should be in torchaudio: 👎 (e.g. some heuristics based beat detection)

?

I think we should get the basics right and fast first. But lets keep this open. Also this should come after #5, since we it only makes sense to do HPSS if we could apply the mask

I didn't meant including pretrained models (that would also be nice though, CREPE anyone?).

keunwoochoi commented 5 years ago

Great lines you draw there :) And I 100% agree. If they are indexed as 1, 2, 3, Let's do 1 first. Let's see if we should do 3 Let's not do 2.

Pre-trained model - like, model zoo for audio? You know, there'd be tons of headaches by them. Probably we can provide sort of awesome-list-pytorch-audio-models instead of host them.

keunwoochoi commented 5 years ago

Note - hpss at gist; https://gist.github.com/keunwoochoi/dcbaf3eaa72ca22ea4866bd5e458e32c

keunwoochoi commented 5 years ago

I don't understand why I said

Let's see if we should do 3 Let's not do 2.

Instead of

Let's see if we should do 2 Let's not do 3.