vincentherrmann / pytorch-wavenet

An implementation of WaveNet with fast generation
MIT License
968 stars 225 forks source link

Receptive Field #14

Closed problemsniper closed 6 years ago

problemsniper commented 6 years ago

Hello @vincentherrmann , great work on wavenet on pytorch. I am not able to understand your receptive field calculation. Can you please let me know how that is being calculated. I was thinking receptive field is size = b locks * (2 ^ (layers + 1)).

Can you please let me know how this is being calculated in your code. And what is the formula that is being using and any kind of justification there might be for using that formula.

vincentherrmann commented 6 years ago

Hi! Maybe it depends on how you count layers, but I think the simplest correct formula for the receptive field is blocks 2^layers - blocks + 1. In order to really understand it, you may have to draw some dilation diagrams. But it's easy to see for example that a network with three blocks, each containing just one layer, has a receptive field of size 4. If we want use a dilation or kernel size other than 2, the formula gets pretty complicated. Here's what I came up with, though I'm not absolutely sure it's right: receptive field = blocks dilation^layers * (kernel_size - 1) - (kernel_size - 2) - blocks + 1

In the code I don't use a formula, the receptive field is simply calculated one layer at a time. Each layer adds (kernel_size - 1) * current_dilation to the receptive field. This even allows for multiple dilations or kernel sizes in one model.

problemsniper commented 6 years ago

Awesome! Makes sense. Is there a reference from where you got the simplest way to calculate a receptive field. I just want to know why that formula 😅

Also just another question on the same grounds. I see that the item length passed in to the audio_data class has item_length=model.receptive_field + model.output_length - 1. I am not sure of 2 things here. Why is output length 16 to be added here and why is there a -1 at the end of it.

Thanks a lot in advance for your help.

vincentherrmann commented 6 years ago

I don't have a reference, sorry. It's a bit hard to put into words, I think you need some visual help to explain. When I have time I will hopefully upload a picture.

The item_length is the number of samples the network gets as input during training and output length is the number of consecutive samples the network outputs. If we were to output only one sample (output length = 1), then there would be item_length=model.receptive_field. But for each additional output sample, we also need an additional input sample, so item_length=model.receptive_field + (output_length-1). Of course during generation we have to set output_length=1.

problemsniper commented 6 years ago

Amazing. Thanks a lot @vincentherrmann