Thoughts on 'crackle' - Githubissues

Sciss commented 7 years ago

Hi there,

thanks for your work implementing this. I'm trying to see if I can use this in a scenario where I want to work with the idea of extrapolating musical material. Unfortunately, my computer is too slow (it's a i7 laptop, but no GPU), I'm running now with one single audio file in corpus at default sr of 16 kHz, getting 2 steps per minute. So I have this running for a day and half, but I don't see it converging fast enough; I mean, it captures the timbre of the sound, that's great, but there is a crackle in the results that doesn't seem to go away.

Here are some iterations: https://soundcloud.com/sciss/sets/269171-wavenet

In particular, it's nice to see that the noise in iter 1200 disappears and the resonances start to come out. When I heard iter 2500 I thought, great, it's going to work out, the crackle is almost gone. But then in iter 4400, the crackle has become more prominent again. If I look at the waveform, it seems the system introduces small bursts at Nyquist which I think are responsible for that crackling sound: http://i.imgur.com/WQ97Zu6.png

Any thoughts on that?

Also - I would like to change the algorithm in two ways:

do away with the 8-bit 𝝁-law encoding, because that has really bad SNR. I would prefer to work directly with floating point scalars.
allow it to operate on stereo- or multi-channel input.

Any thoughts on this?

Thanks!

vjravi commented 7 years ago

Similar to what the authors claim, I have also noticed that results are better with a bigger dataset. And convergence does not happen till at least 50k iterations going upto 200k. I would keep everything else as it is in the algorithm, with the addition of a decaying learning rate.

In other words, you really need a GPU.

Sciss commented 7 years ago

D'oh. And if I manage to replace the category encoding? Wouldn't that instantly yield a speed-up of 256x?

What I noticed is that not all CPU cores are fully running, I think tensorflow, at least the version I installed, is not well parallelised. I have a handful of Raspberry Pis lying around, and a Core i5 Mac Mini with Linux. I could offload the rendering to those boxes and at least free my laptop; but it's a company machine, there is no option to add a GPU here (and also I have no financial resources).

lemonzi commented 7 years ago

It's easy to remove the 8-bit encoding; it's only there because with speech results tend to be better. If you remove it, though, you won't get a speed-up of 256x; the first and last layers will be faster, but the intermediate layers will still have high-dimensional embeddings that you need to compute anyway. The architecture is something like:

floating-point sample -> 8-bit encoding -> high-dimensional embedding -> wavenet -> high-dimensional embedding -> 8-bit encoding -> output sampling

If you want to have a multi-channel model, you need to decide which parts are shared (you apply the operations with the same weights to all channels separately, a kind of 1D convolution) and in which you "mix things up".

Regarding efficiency: a GPU definitely helps. The code could probably be more efficient, especially the generation part. Regardless, if you run on CPU you will need to tweak some parameters regarding how many threads are run per queue, which are hard-coded. It's also very, very important that you install from sources to make full use of advanced CPU instructions and that you have the optimised C++ protobuf package (I had to compile and install it separately from TensorFlow).

Sciss commented 7 years ago

Thanks for the comments. I noticed that TF prints some warnings about not being compiled with some CPU primitives that are available on my computer, so I should definitely compile that from source.

lemonzi commented 7 years ago

@Sciss yes, exactly! Regarding the protobuf package: You can check with

python -c "from google.protobuf.internal import api_implementation; print(api_implementation._default_implementation_type)"

That should print "cpp" if it's optimized or python if not. If not, follow this after you have compiled TensorFlow (TensorFlow compiles a non-optimized protobuf, and then you override it):

https://www.tensorflow.org/install/install_linux#protobuf_pip_package_31

Basically, install the optimized version (URL depends on OS and Python version) with:

pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/protobuf-3.1.0-cp27-none-linux_x86_64.whl

Or you can even try and compile protobuf from source. I don't remember now which one I did.

lemonzi commented 7 years ago

Also, try using more threads here:

https://github.com/ibab/tensorflow-wavenet/blob/master/wavenet/audio_reader.py#L194

It could be that your CPU is stalling waiting for more data to be read from the hard drive and pre-processed.

ibab / tensorflow-wavenet

Thoughts on 'crackle' #271