Training & validation subdirs

aufr33 commented 4 years ago

After reading this article (chapter 2.2), I learned a very important thing - the same artist should not be simultaneously appear in the training dataset and the validation dataset. But the train.py script randomly splits the dataset, preventing me from manually distribution the tracks.

Therefore, I suggest to organize the directories as follows:

dataset/ ├── training/ │ ├── instruments/ │ │ ├── 01_foo_inst.wav │ │ ├── 02_bar_inst.wav │ │ └── ... │ └── mixtures/ │ ├── 01_foo_mix.wav │ ├── 02_bar_mix.wav │ └── ... └── validation/ ├── instruments/ │ ├── 03_foo_inst.wav │ ├── 04_bar_inst.wav │ └── ... └── mixtures/ ├── 03_foo_mix.wav ├── 04_bar_mix.wav └── ...

Anjok07 commented 4 years ago

This would be VERY useful!

I'm in the process of starting from scratch myself to improve my models. A lot of the advice and feedback you've been giving has been super helpful. Please feel free to share any new discoveries!

aufr33 commented 4 years ago

@Anjok07 I also started remaking my dataset. This time I used the purest studio acapellas to create the mixes. I rejected a lot of multitracks and left only really HQ stuff, which is mostly of lossless quality (and a small amount of mp3/ogg).

From the instrumental and the mix, I remove everything that raises doubts - vocal chops, vocalises, etc. I also cut out all the long intros and outros without vocals (15-20 seconds or more). If the vocals are dry, I use effects like EQ, exciter, reverb, and delay to simulate a "commercial" sound. Also I use a compressor so that the vocals are always louder than the instrumental.

Unfortunately, this is a very long process that will take several months, but I think it is worth it. At the moment I have only 250 pairs, the closest target is 400-500 (November), then 700-1000 (next year).

Anjok07 commented 4 years ago

@aufr33 I've been like a mad data scientist trying to find the right balance of pairs. I started with a 750 pair dataset and ended up purging a good 40% of it because I kept getting poor results. Other than examining the spectrorams and wave forms, I found that another good way to weed out potentially bad pairs is to -

Train with with them up to at least 15 epochs (granted your validation and training loss numbers look decent)
Run all of the mixtures from your dataset on the model
Examine the vocal tracks specifially. Identify tracks that contain a ton of instrumental bleed through and remove them from your dataset.

The quality of my models has sky rocketed since I started doing this. I've also experimented with different bitrates and did find much of a difference, if any at all. Definietly keep it up! As of right now I've narrowed my dataset to roughly 450 pairs. I'll be posting a model from them once it finishes. I wouldn't be opposed to sharing my dataset with you if you'd like to check it out or just it in conjunction with your's. I wouldn't be opposed to examining and using yours as well.

The model I have in training now is at about 17 epochs and is performing very well. The vocal tracks have almost no instrumental bleed through and the instrumentals sound clean and smooth.

aufr33 commented 4 years ago

@Anjok07 Are you training on version 3.0.1? Is there a significant difference in inference quality and training speed compared to 2.2?

I can send you a part of my dataset, as far as my internet connection will allow, the whole dataset takes 25 GB. Please leave your email address.

We can combine datasets into one. But for this you need to identify each song in order to remove duplicates. Surely many songs in our datasets will be the same.

Anjok07 commented 4 years ago

@aufr33 I trained on both and found that it is a little quicker and the vocal removals are more aggressive than 2.2.0, but the audio quality for the instrumental is noticeably degraded on v3. I prefer cleaner instrumentals, so I stick mostly to v2.2.0. Though I plan on continuing to train models for both and compare. Training doesn't take take too long on my PC, and that's enabled me to do A LOT of testing.

My email is anjok0314@gmail.com

aufr33 commented 4 years ago

@Anjok07 Yes, now the sound quality resembles a Spleeter. I also like 2.2 better, but so far I've only compared stock models. Perhaps this can be fixed with new command line arguments?

I sent you 1 part of my dataset.

Anjok07 commented 4 years ago

@aufr33 Just got back to checking this today, thanks! I'll respond here shortly

tsurumeso / vocal-remover

Training & validation subdirs #49