madebyollin / acapellabot

Acapella Extraction with a ConvNet
http://madebyoll.in/posts/cnn_acapella_extraction/
205 stars 44 forks source link

how to train this model by myself? #4

Open vell001 opened 7 years ago

vell001 commented 7 years ago

Hello, how can I get myself to training this model? May I have your dataset? or how to make it by myself?

madebyollin commented 7 years ago

I talk a bit about my data collection process here.

I think you can probably get away with much less; the only really important part about the data is that the model can use it to learn to isolate acapellas from background noise. The minimum viable data collection process would be (assuming you don't want to modify the data loader at all):

If you want to be a bit more careful you can actually key-tag the songs with KeyFinder and use the Camelot key as the number, so that the data processor only makes mashups of songs with the same key. You can also use Audacity to adjust the tempo of the acapellas so that they're standardized. For reference, some of my data look like:

screen shot 2017-06-02 at 8 43 36 am

I hope that helps!

vell001 commented 7 years ago

Thanks a lot~ I have anther question, your model cannot split very well, when the song with some special drumbeats, like this test001.mp3.zip

My question is can I solve this problem by using more data with special drumbeats to training your model? I am not sure the problem's reason is or isn't because the lack of data

madebyollin commented 7 years ago

Ah, that's an interesting example! I'm working on an updated version of the model that fixes several architectural problems, but it still can't remove all of the drums in your example (sample output).

I think this is a problem that more diverse training data will probably solve–I'm training exclusively on 128bpm EDM which don't have much variation in drum samples.

That said, I'm not sure that more data will help the current stable version of the model that I've posted to GitHub–you can probably get it to filter out the drums in your example, but probably at the expense of removing legitimate vocals in other songs (new model architecture is a bit smarter about this).

For now, I'd try collecting more data in this style, training a model, and seeing if it does better–even if it doesn't work, you'll have the data for training on the new architecture once it's working properly 👍

vell001 commented 7 years ago

ok, thanks, please let me know if you have any progress, I'll try to training based on your model with more special style data thanks again~