ibab / tensorflow-wavenet

A TensorFlow implementation of DeepMind's WaveNet paper
MIT License
5.41k stars 1.29k forks source link

What is the rule of thumb for generating music with wavenet? #195

Open pennygalaxi opened 7 years ago

pennygalaxi commented 7 years ago

I have a large library of mp3 songs. What is the best way to process these songs in order to get a good result with wavenet?

So far, I tried the following approach which doesn't seem to work very well: convert mp3s to wav files (16 bits per sample) and then run the training script (with default parameters). My questions are as follows:

Please feel free to add any comments that you think its relevant to get a good result. Thanks a lot!

veqtor commented 7 years ago

Depending on what you're trying to train on, it might be that the net is trying to find a "universal" solution that just doesn't exists without local and/or global conditioning being introduced. A longer receptive field might solve some issues, current settings yield a receptive field of ~250 msec @ 16khz

At what step have you given up so far?

Try music that has the same instrumentation, genre and tempo, it might help. I think it wasn't a coincidence that Google chose to train on classical piano music. The network would then, in terms of sound generation, only have to build a representation of a piano.

I'd also suggest actually listening to whatever the network has found so far, it might be that it's getting close to some kind of solution. If you have very diverse training material, maybe it needs to train for a REALLY long time. I tried training on 8-bit music and reached some kind of simple solution quite quickly, speech seems to need around 88k steps. We don't know what is needed to generate (especially with complex timbres, variations on instruments such as electric guitar with effects, mastering affecting the inter-dynamics of instruments and so on and so on). The Wavenet paper quickly mentions having global conditioning describing genre, tempo and some other inputs when training on arbitrary music sets.

neale commented 7 years ago

@veqtor is right with respect to training time. I used a 1070 and tried for quite some time to get good audio output using mp3 files. Here are some of the hints I found, these probably break down if your resources are large.

I hope some of this helps @pennygalaxi

devinplatt commented 7 years ago

Hey @Neale, thanks for the tips! :) I'm wondering if you could elaborate on a few of your points:

Thanks!

njbittner commented 7 years ago

@neale or @veqtor, any chance either of you would be willing to share one of your trained music models?