ibab / tensorflow-wavenet

A TensorFlow implementation of DeepMind's WaveNet paper
MIT License
5.41k stars 1.29k forks source link

Audio as local input #175

Open veqtor opened 8 years ago

veqtor commented 8 years ago

With audio as local input we could explore using wavenets for audio processing, can they be used to simulate reverbs, non-linear time-varying signal processing (chorus/flanger), etc? Furthermore, I think, if we have a built-in sample-rate converter, we could probably train a network on up-sampling a low 8khz or 16khz sample to 44,1khz and estimate what kind of frequencies are missing, this could probably be done without a very long receptive field, perhaps it can also be done to create a bit-depth estimation (converting the bit-depth from 8-bit to 16-bit).

FlipBoard uses deep networks to upscale images, it would make sense that we could do the same for audio, that way we can stick to low sample-rates and bitdepths for generation and attach a cheaper upscaling network to get better quality sound.

stefbraun commented 8 years ago

I am also interested in using other timeseries (e.g. audio features) as local input. It seems like we only have to modify the operation in https://github.com/ibab/tensorflow-wavenet/blob/master/wavenet/model.py#L210 to incorporate our second time-series 'y' (as they name it in the paper on page 5).