ws-choi / Conditioned-Source-Separation-LaSAFT

A PyTorch implementation of the paper: "LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Separation" (ICASSP 2021)
MIT License
85 stars 18 forks source link

PROBLEMS AND QUESTIONS #22

Closed lucasdobr15 closed 3 years ago

lucasdobr15 commented 3 years ago

Hi all right, sorry to bother you boss,

I would like to know how do I train a model using my own dataset? i already trained my own models using vocal-remover from tsurumeso, now i would like to train a new model using the wonderful Lasaft

the architecture of the vocal remove datasets looks like this:

path/to/dataset/ +- instruments/ | +- 01_foo_inst.wav | +- 02_bar_inst.mp3 | +- ... +- mixtures/ +- 01_foo_mix.wav +- 02_bar_mix.mp3 +- ...

1. I wonder if Lasaft also allows you to train a model using only Mixture and Instrumental?

2. I would like to congratulate you because you are on the top 1st Music Demixing (MDX), you are a genius, you deserve the award and much more!

3. Also, during these past months, have you trained any other .ckpt model for Lasaft? If so, could you share with those who appreciate your work so much?

4. Relating a problem... why when I use the Lasaft it makes the accompaniment/stems crash during the song? this problem occurs both in the 2stems version as well as the 4stems version

Sorry for the many questions, but this is the only place I can get in touch with you, your work is beautiful,

I look forward to hearing from you,

Yours truly, Lucas Rodrigues.

ws-choi commented 3 years ago

Hi, @lucasdobr15 again! It seems that you want to build a vocal remover. If it is not, then please let me know.

  1. LaSAFT+GPoCM was initially designed for conditioned source separation, where we want to separate multiple sources with a single instance. You can train our model for vocal remover with your own dataset, though. All you need is customizing data provider for your model: https://github.com/ws-choi/Conditioned-Source-Separation-LaSAFT/tree/main/lasaft/data. Or, you might prefer to use our previous model: https://github.com/ws-choi/ISMIR2020_U_Nets_SVS. Although it was initially proposed for singing voice separation, we found that it works for other sources such as drums. I think it would also work for the vocal-removing task.

  2. Thank you very much for your interest!

  3. We've developed models for another MIR problem called Audio Manipulation on Specified Sources (AMSS) for several months, instead of source separation: https://github.com/kuielab/AMSS-Net. So we have not spent much time re-training LaSAFT+GPoCM models. You can download three different ckpts for large models from http://intelligence.korea.ac.kr/assets/ . You can download them via PreTrainedLaSAFTNet(model_name='lasaft_large_2020'). If you encounter related problems, then please let me know. Also, we trained a lightweight version of our model called LightSAFT for the DMX challenge. Check the following link for details: https://github.com/ws-choi/music-demixing-challenge-starter-kit/blob/master/test_lightsaft.py

  4. What does `crash mean? Are processes killed by OS? Or did the separated results contain some wired noises? Or did you perceive a kind of glitches where contexts shift unnaturally?

Best, Woosung Choi.