r9y9 / wavenet_vocoder

WaveNet vocoder
https://r9y9.github.io/wavenet_vocoder/
Other
2.3k stars 500 forks source link

Music generation #13

Closed r9y9 closed 5 years ago

r9y9 commented 6 years ago

Things I want to try if I get a chance, Comments and requests are welcome.

imdatceleste commented 6 years ago

@r9y9, I've started training a model on Beethoven piano concertos. Learnings: use only data that has the same tempo (e.g. "Adagio only"). Don't use too much data (I have about 20 minutes) Use only single-instrument music.

Hope that helps. Initial results are not very promising so far. I only used "Click Remover" and "Amplify" on top of what wavenet_vocoder generated.

I would love to use 65536 quantizations, but my GPU-memory is not enough (11GB).

r9y9 commented 6 years ago

With mixture of logistic distributions, we can reduce GPU-memory usage drastically. For example, with 10 mixture of logistic distributions, we have only 30 nodes in the output layer.

https://github.com/r9y9/wavenet_vocoder/blob/3cf66f5a6b0df55bca98fe4967f5bb2ca2fcb86d/hparams.py#L60-L62

One drawback of use of logistic distributions I noticed from my experiments is that it will take too much time (7 ~ 10 days) to get sufficient good quality of generated speech.

imdatceleste commented 6 years ago

Currently, I'm using out_channels = 256 because quantize_channels=256 and input_type=mulaw-quantize.

Even with these settings, it takes quite some time to generate music. wavenet is computationally quite intensive (as you well know). I'm looking into areas where we can make it faster...

SPGB commented 5 years ago

Sorry to bring up an old issue but would you be able to share the hyperparameters you're using @imdatsolak ?

I've mostly gotten static after almost a week of training so any insights you might have would be useful! I'm preprocessing using python3 preprocess.py --preset=presets/music_1.json librivox ~/data/cello ./data/cello (librivox since my corpus is long recordings of cello music and I see librivox splits it up into 8 second segments) and training with python3 train.py --data-root=./data/cello --preset=presets/music_2.json

music_2.json is essentially cmu_arctic_8bit.json but I reduced the sample_rate down to 8k since I just want to prototype for now and increased the learning rate. Also using mulaw-quantize with 256 out/quantize channels.

aleksas commented 5 years ago

Shouldn't it be possible to generate solo vocal singing conditioned on MIDI? It may be mumbling but a melodic mumbling :) There is a python library to extract MIDI from song: Audio To Midi Melodia based on Melodia plugin. The Demo of extracted MIDI from songs looks promising. What I couldn't find is a dataset of clean vocals.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.