Model quantization available?

Hi,

Although current generate.py use fast WaveNet generation algorithm, but is still too slow.

Is it available for the network to be quantized?

But in TF tutorial, it said we need to specify input and output nodes name, "--outputs="softmax" --out_graph=/tmp/xxx", I think in this quantization process the class information will be lost, such as the properties in WaveNetModel class, and also the Queue for fast generation is lost, because I think the quantized output is just a graph definition, it could not contains any other property such as the enqueue/dequeue operations needed for fast computation. When I tried freeze the model, it also lost the queue and other class property.

another way to speed up the inference solution: could we dump the weights of each WaveNet layer and then do matrix computation by Numpy or C++ Eigen Library? If OK, how could we dump the value of each kernel?

ibab / tensorflow-wavenet

Model quantization available? #293