This is a fork of ibab's excellent implementation of WaveNet. Here we are implementing changes for the generation of facial animations.
Authors of wavenet stated that modeling the conditional distribution:
with softmax distribution tends to work better than some of previous modeling, such as mixture density network or mixture of conditional Gaussian scale mixtures.
This approach models each shape key values of each frame to softmax probability distribution of 11 classes. The probability values are calculated with the method explained here.
Here is github branch for this Experiment.
This approach is variant of approach adopted in pixelcnn, to extend the dependencies among pixels to color channels. In pixelrnn paper the author stated the joint probability for pixel sampling is
,
Where, is pixel.
They also extend this pixel dependencies to color channels as follows:-
If we use indexes for color channels(assign 1 to R index, 2 to G index and 3 to B index), the joint probability p(x) now can be written as
This type of joint probability can be applied to shape key sampling. Now the above joint probability can be rewritten for shape key sampling as
where S is number of samples, N is number of shape keys in single frame, is frame and is jth shape key of ith frame.
Code implementation for this experiment is found on github branch.
TensorFlow needs to be installed before running the training script. Code is tested on TensorFlow version 1.0.1 for Python 2.7 and Python 3.5.
In addition, librosa must be installed for reading and writing audio.
To install the required python packages, run
pip install -r requirements.txt
For GPU support, use
pip install -r requirements_gpu.txt
Install the test requirements
pip install -r requirements_test.txt
Run the test suite
./ci/test.sh