ibab / tensorflow-wavenet

A TensorFlow implementation of DeepMind's WaveNet paper
MIT License
5.41k stars 1.29k forks source link

Trying to understand causality of local conditioning with fast generation #264

Open veqtor opened 7 years ago

veqtor commented 7 years ago

So, I've been toying with local conditioning lately and while I can see quite easily how it would be implemented for slow generation, I can't really wrap my head around how it would work with fast generation.

The DeepMind paper states that they used transposed convolutions, from this I think that we need to create a method that learns a transposed upsampling while training.

We could then, using chunks of say 256 label samples, upsample them to audio rate using the learned upsampling, and condition on the sample-rate labels. But what I'm a bit unsure about is if I could supply them in a feed dict just like for GC embedding or if there might be any side-effects of supplying an LC input vector at every generated step

lemonzi commented 7 years ago

Well, the generation code as it is written now (one sess.run() call per sample) is inefficient. It should be rewritten into a native TensorFlow loop, in which case you could just upsample the vector and use it immediately.

To get it working as it is now, I guess you would have to run the upsampling separately, get it as a numpy matrix, and then feed it sample by sample. Regarding side effects, you would need to either replicate the generation Queue for the conditioning or supply wider vectors to it. I think there's a PR that implements these new Queues.