ibab / tensorflow-wavenet

A TensorFlow implementation of DeepMind's WaveNet paper
MIT License
5.41k stars 1.3k forks source link

Is it possible to work with a multivariate time series approach? #308

Open nandofernandesneto opened 6 years ago

nandofernandesneto commented 6 years ago

Hi! First of all, I would like to congratulate you for your brilliant work. So far, I have adapted your code to work with time series, instead audio files. For now, I'm using it to (successfully) predict the results of Ordinary Differential Equations, linear and non-linear ones.

I was wondering if it is easy to adapt it to work with more than one observable state, so I would be able to simulate complete systems and teach the neural network some physical laws that could be useful for dynamic optimization and so on, where as we haven't identified the system.

Do you have any ideas on it? If you feel interested, I'd commit a fork of your project with these modifications here in github. For now, I can only simulate univariate systems.

Best Regards,

Fernando Fernandes

ljuvela commented 6 years ago

These guys used a very WaveNet-like architecture in a multivariate setup by replacing the classification task with a Gaussian mixture density network: http://www.isca-speech.org/archive/Interspeech_2017/abstracts/1420.html

Also the new paper (https://deepmind.com/blog/high-fidelity-speech-synthesis-wavenet/) mentioned using logistic mixtures, and cited https://arxiv.org/pdf/1701.05517.pdf for details.

I guess it should be useful to explore these mixture density approaches for both multivariate modelling and understanding the latest WaveNet developments.

nandofernandesneto commented 6 years ago

Thanks for your reply. I will take a look. Btw, have you implemented anything like that so far?

PS: I was thinking of attacking this problem as a multi-channel audio. Audio files are nothing more than high-frequency time series... Any thoughts on this idea? Is it hard?

ljuvela commented 6 years ago

I've been experimenting with the mixture-of-logistics stuff lately, and it could extend to multiple channels with relative ease.

Turns out that OpenAI's PixelCNN++ is already doing multichannel modelling (=RGB color channels). Check out their code at https://github.com/openai/pixel-cnn/tree/master/pixel_cnn_pp

They are using a weight-tying scheme for the channels, which makes the code a bit messy. In principle, you could remove this from the output layer and just assume that the logistic mixture parameters depend only on a shared latent variable (represented by the penultimate layer of the post-processing module).

rafaelvalle commented 6 years ago

@ljuvela thanks for sharing this link!