tsurumeso / vocal-remover

Vocal Remover using Deep Neural Networks
MIT License
1.47k stars 215 forks source link

remove vocal from song #32

Closed 404000 closed 3 years ago

404000 commented 4 years ago

I tried some songs,The singer's voice is not removed from these songs make your vocal remover remove all vocals from all songs

aufr33 commented 4 years ago

The standard model was trained in just 170 songs (pairs). You must train your model yourself. I used 350 songs and already got slightly better results.

In any case, today there is no such AI that would clear any songs from vocals.

404000 commented 4 years ago

tell me how train your own model

aufr33 commented 4 years ago

tell me how train your own model

Read the README file. It is very long and difficult to assemble a good dataset. You need to prepare a pair of instrumental and mix. I downloaded multitracks and mixed every song through the audio editing software.

404000 commented 4 years ago

I did not understand explain to me using a video

TRvlvr commented 4 years ago

Datasets can take a LONG time to assemble.

aufr33 commented 4 years ago

I've found in my own experiences that not every official studio instrumental will align with its official mix counterpart.

By the way, it is not necessary to use only official instrumentals. I also used "homemade" instrumentals + choral singing. And even voice samples from the Voyager Golden Records + classical music.

TRvlvr commented 4 years ago

I've found in my own experiences that not every official studio instrumental will align with its official mix counterpart.

By the way, it is not necessary to use only official instrumentals. I also used "homemade" instrumentals + choral singing. And even voice samples from the Voyager Golden Records + classical music.

You can do that as well, but I haven't had much luck using homemade instrumentals. The issue being that any lingering vocals (even background vocals) will compromise the effectiveness of the model. I find it's better to train the AI on instrumentals with absolutely no vocals in them at all.

If background vocals are needed for karaoke, it's best to plug them in after conversions.

bascurtiz commented 4 years ago

@aufr33 @TRvlvr

You can also download all the beatport stems (15000+) which are 2 mins previews of full tracks, split up in 4 stems in decent quality, using this python script: https://gist.github.com/kylemcdonald/e20ec59273e78d0075bd71a1d08f4c41

Alter line 12: total_tracks = 4106 to total_tracks = 15400 Alter line 34 : print 'Error: ' + r.status to print('Error:' + r.status) Make sure Python 3.x is installed.

I also have a The Largest Multitrack Music Collection Ever! 2013 66.3GB torrent d/led, which contains 1385 studio ripped multitracks tracks. Let me know if interested.

aufr33 commented 4 years ago

The issue being that any lingering vocals (even background vocals) will compromise the effectiveness of the model.

No, I meant that I used a pair of "clean music and my own mix".

Although, an additional model trained on instrumentals with backing vocals is a good idea. I have already tried and got funny results. My "backing" model really keeps the backing vocals, but it works worse.

But I still do not understand what to do with vocal chops and vinyl scratches. After all, this is part of the music, but, on the other hand, it is a human voice. This contradicts each other. So far I'm cutting out all the dubious sounds, just in case.

You can also download all the beatport stems (15000+) which are 2 mins previews of full tracks,

I doubt that mp3 96 is a good quality for the dataset, but thanks anyway for the link.

bascurtiz commented 4 years ago

@aufr33 Since Spleeter uses a 16khz filter, I don't think 96khz will affect it as much. See also: https://github.com/deezer/spleeter/wiki/2.-Getting-started#using-models-up-to-16khz

image

aufr33 commented 4 years ago

It is true, many people do not hear above 16 kHz. But this is not the only deterioration in quality. There are also losses at medium frequencies, smoothing transients, etc. Personally, I would use such mp3 files only in a small amount to expand the dataset.

By the way, I used a non-standard sample rate of 36750 Hz. Compared to 44100, inference is a bit faster.

404000 commented 4 years ago

What's in a file baseline .pth in vocal remover

aufr33 commented 4 years ago

I also want to note that the use of some songs in the dataset (for example, Gorillaz - Feel Good Inc) led to inadequate results. I had no choice but to remove them and start the learning process again.