Open keunwoochoi opened 5 years ago
I think we should stay away from classical dsp based methods, since those are to be replaced by deep learning based ones sooner or later anyway (https://arxiv.org/abs/1807.11298) Same for standard filters, etc.
personally, I'm against anything except STFT, except if its perceptually motivated (which mel do... partly). But convince me that we still need that! ;-)
No that's fair overall. But, I don't think that having deep learning based model out there should be a reason for us not to have something (for partly the same reason I'd like to have MFCC here) - because, well, then are we 'deploying' one of those pre-trained models? Are those things mature enough to be a part of? Would they be consistent (or even working) with newer torch versions? In short, I don't think we're there yet in general. Meanwhile, once-deployed things like HPSS/MFCC would work without too much burden.
But I'm also not 100% sure. And I'll need it anyway (am gonna implement it very now), so.. we can wait and see :)
Okay, maybe we can draw the line and implement it that order:
?
I think we should get the basics right and fast first. But lets keep this open. Also this should come after #5, since we it only makes sense to do HPSS if we could apply the mask
I didn't meant including pretrained models (that would also be nice though, CREPE anyone?).
Great lines you draw there :) And I 100% agree. If they are indexed as 1, 2, 3, Let's do 1 first. Let's see if we should do 3 Let's not do 2.
Pre-trained model - like, model zoo for audio? You know, there'd be tons of headaches by them. Probably we can provide sort of awesome-list-pytorch-audio-models instead of host them.
Note - hpss at gist; https://gist.github.com/keunwoochoi/dcbaf3eaa72ca22ea4866bd5e458e32c
I don't understand why I said
Let's see if we should do 3 Let's not do 2.
Instead of
Let's see if we should do 2 Let's not do 3.
torch has
median()
. A median-filtering bass harmonic percussive source separation can be easily implemented. It's quite a bit MIR-only though.