Open codaich opened 7 years ago
Short answer: Yes... Maybe...
Longer answer: Perhaps, compiling it to a portable model and using that to loop over data could be a solution What does this mean in practice? If we were to use scalar input, you could use the batch loss function on an entire audio clip, initialised with noise, and do gradient ascent that activates a certain layer/neuron. It would basically be the same as: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/deepdream/deepdream.ipynb
However, since wavenet is recursive, perhaps what could yield interesting results would be to do style transfer somehow. I don't really understand the process well enough to be able to implement it but I'm guessing layers 5-20'ish should have "timbral style" information.
Another interesting thing to do would be to adjust, using gradient ascent, an input (noise or sound file) to maximize output matching another sound-file given various global conditions etc
Thanks, much appreciated.
@codaich Did anything interesting come out of maximizing the activity of hidden layers/neurons?
Is it possible to access the output of inner network layers using this codebase and, if so, how?
I ask because we’re interested in 1) training a network on certain kinds of audio and then using the trained network to generate new similar audio (much like the babbling or classical piano generation described in https://deepmind.com/blog/wavenet-generative-model-raw-audio/) and 2) seeing what effect each network layer is having on the audio that is generated (much like Google did with respect to images in https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html, where they “pick a layer and ask the network to enhance whatever it detected”).
Related to this, is there a particular network architecture that might be suited to doing this? For example, would inner layers to be examined this way need to be of the same size as the final layer?
Thanks!