Closed 404000 closed 3 years ago
The standard model was trained in just 170 songs (pairs). You must train your model yourself. I used 350 songs and already got slightly better results.
In any case, today there is no such AI that would clear any songs from vocals.
tell me how train your own model
tell me how train your own model
Read the README file. It is very long and difficult to assemble a good dataset. You need to prepare a pair of instrumental and mix. I downloaded multitracks and mixed every song through the audio editing software.
I did not understand explain to me using a video
Datasets can take a LONG time to assemble.
I've found in my own experiences that not every official studio instrumental will align with its official mix counterpart.
By the way, it is not necessary to use only official instrumentals. I also used "homemade" instrumentals + choral singing. And even voice samples from the Voyager Golden Records + classical music.
I've found in my own experiences that not every official studio instrumental will align with its official mix counterpart.
By the way, it is not necessary to use only official instrumentals. I also used "homemade" instrumentals + choral singing. And even voice samples from the Voyager Golden Records + classical music.
You can do that as well, but I haven't had much luck using homemade instrumentals. The issue being that any lingering vocals (even background vocals) will compromise the effectiveness of the model. I find it's better to train the AI on instrumentals with absolutely no vocals in them at all.
If background vocals are needed for karaoke, it's best to plug them in after conversions.
@aufr33 @TRvlvr
You can also download all the beatport stems (15000+) which are 2 mins previews of full tracks, split up in 4 stems in decent quality, using this python script: https://gist.github.com/kylemcdonald/e20ec59273e78d0075bd71a1d08f4c41
Alter line 12: total_tracks = 4106
to total_tracks = 15400
Alter line 34 : print 'Error: ' + r.status
to print('Error:' + r.status)
Make sure Python 3.x is installed.
I also have a The Largest Multitrack Music Collection Ever! 2013 66.3GB torrent d/led, which contains 1385 studio ripped multitracks tracks. Let me know if interested.
The issue being that any lingering vocals (even background vocals) will compromise the effectiveness of the model.
No, I meant that I used a pair of "clean music and my own mix".
Although, an additional model trained on instrumentals with backing vocals is a good idea. I have already tried and got funny results. My "backing" model really keeps the backing vocals, but it works worse.
But I still do not understand what to do with vocal chops and vinyl scratches. After all, this is part of the music, but, on the other hand, it is a human voice. This contradicts each other. So far I'm cutting out all the dubious sounds, just in case.
You can also download all the beatport stems (15000+) which are 2 mins previews of full tracks,
I doubt that mp3 96 is a good quality for the dataset, but thanks anyway for the link.
@aufr33 Since Spleeter uses a 16khz filter, I don't think 96khz will affect it as much. See also: https://github.com/deezer/spleeter/wiki/2.-Getting-started#using-models-up-to-16khz
It is true, many people do not hear above 16 kHz. But this is not the only deterioration in quality. There are also losses at medium frequencies, smoothing transients, etc. Personally, I would use such mp3 files only in a small amount to expand the dataset.
By the way, I used a non-standard sample rate of 36750 Hz. Compared to 44100, inference is a bit faster.
What's in a file baseline .pth in vocal remover
I also want to note that the use of some songs in the dataset (for example, Gorillaz - Feel Good Inc) led to inadequate results. I had no choice but to remove them and start the learning process again.
I tried some songs,The singer's voice is not removed from these songs make your vocal remover remove all vocals from all songs