tsurumeso / vocal-remover

Vocal Remover using Deep Neural Networks
MIT License
1.47k stars 215 forks source link

Training model using only 1 instead of 5 dataset pairs #154

Open maxmusterm4nn opened 7 months ago

maxmusterm4nn commented 7 months ago

Hello tsurumeso,

First of all, I’m very pleased about your vocal-remover project!

I would like to train my own model using only 1 dataset pair (instrumental + mixture) instead of the default 5 pairs.

Could you please give an advice which settings should I change to do so?

I have football matches that are ca. 90-100 minutes long and contain multiple audio tracks with and without commentary. I’d like to use those sources to train my model one-by-one for each matches.

Do you think that it would work for these length of audio?

Furthermore, I’d like to buy AMD Radeon X7900XTX graphics card. Do you have any experience about training models using AMD GPU?

Thank you for your help, in advance! :-)

aufr33 commented 7 months ago

You can split the big pair into segments:

ffmpeg -i big_mix.wav -f segment -segment_time 300 -c copy %03d_mix.wav ffmpeg -i big_inst.wav -f segment -segment_time 300 -c copy %03d_inst.wav

You definitely need validation pairs, about 20% of the entire dataset. The validation data must be different from the training data.

maxmusterm4nn commented 7 months ago

Thank you for your advice!

Do you mean different, that in my case (where football stadium crowd noise is for instruments and + commentary is for mixture) the validation data should contain for example music+vocals?

How can I get know that which pair of the dataset will be the validation data?