Trying to understand causality of local conditioning with fast generation

ibab / tensorflow-wavenet

A TensorFlow implementation of DeepMind's WaveNet paper

MIT License

5.41k stars 1.29k forks source link

So, I've been toying with local conditioning lately and while I can see quite easily how it would be implemented for slow generation, I can't really wrap my head around how it would work with fast generation.

The DeepMind paper states that they used transposed convolutions, from this I think that we need to create a method that learns a transposed upsampling while training.

We could then, using chunks of say 256 label samples, upsample them to audio rate using the learned upsampling, and condition on the sample-rate labels. But what I'm a bit unsure about is if I could supply them in a feed dict just like for GC embedding or if there might be any side-effects of supplying an LC input vector at every generated step

ibab / tensorflow-wavenet

Trying to understand causality of local conditioning with fast generation #264