tky823 / DNN-based_source_separation

A PyTorch implementation of DNN-based source separation.
286 stars 50 forks source link

Model common usage #66

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hi tky823, thank you so much for your work, this is a great framework. I've just a quick question about the usage of the models. I see you provide Conv-TasNet & D3Net for a training over the MUSDB but not using the TasNet and more specificaly the DPRNNTasNet. Are theses models eventually working on other than pure speech signal? And, in your opinion, what is the best (current) model that cover musical signal? Thank you very much.

tky823 commented 3 years ago

As described in "Music Source Separation in the Waveform Domain", TasNet works as music source separation. However, I haven't tried it because it would take a lot of time to train with all the data in MUSDB. Also, please note that the architecture of D3Net has not been completely reproduced. I think the best model is demucs these days, which is easy to reproduce because the code is provided.

ghost commented 3 years ago

You're right, demucs (in is v2 implementation) is a very good project, especially for the bass / drums separation. However, the vocals / other stems suffer of some interference (bleed). That's why, at this time, a model like LaSAFT allow better vocals separation but seem under the D3Net SDR. My project is to select the best model for each type of instrument and (maybe) extend the corpus to extra tracks (not using MUSDB).

Lead and Backing vocals separated, and maybe the guitars / keyboards with each specificity (lead/rhythm guitar, acoustic/electric for the guitar and the piano, organ, synth ....). Axed mainly on the guitars because I don't have a lot of studio songs with keyboards.

That's why I suppose that a model like spleeter trained with something like 25K songs is potentially not true studio record but maybe some (good) midi files generated with some (good) virtual instruments (VSTi). And if not, this idea can be a good idea to create a corpus with extended data. Example : the same piano part generated with multiple models of acoustic piano with a bundle like the Chocolate Audio's "The 88 Series Pianos".

Because I don't have a good GPU, I need to be carefull of the model trained for each part because I will paid for a online storage and a online training (GPU).

I note that your D3Net implementation is not ready. I've seen that the Sony repo will release (soon?) the original code of the paper probably after the CVPR 2021 conference : https://github.com/sony/ai-research-code/tree/master/d3net

In any case, thank you very much for your work and the time you spend on my messages !

lyghter commented 3 years ago

It seems like Sony released the nnabla d3net implementation yesterday

tky823 commented 3 years ago

Great information. I will check it.

tky823 commented 3 years ago

Now, I updated D3Net (only network architecture). Check src/models/d3net.py and egs/tutorials/d3net/sample.ipynb.

lyghter commented 3 years ago

Are you planning to make a training script for D3Net?

tky823 commented 3 years ago

It's difficult to implement right away...