keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.71k stars 19.43k forks source link

Conv1D with dilation and "causal" padding isn't reducing the size of the sequence #8751

Closed chausies closed 5 years ago

chausies commented 6 years ago

If you run

x = Input(shape=(seq_length, dim))
y = Conv1D(num_filts, 2, dilation_rate=8, padding="causal")(x)
print(y.shape)

Then the shape printed out will be (?, seq_length, dim), which means the sequence length wasn't changed at all. But isn't this incorrect? Because the dilation makes the effective kernel width of the convolution 9, shouldn't the sequence length of the output be decremented by 8 because the causal kernel needs 9 values to perform its dot product and so doesn't output on the first 8 samples in the sequence?

To clarify, I'm running with an updated tensorflow backend.

gabrieldemarmiesse commented 6 years ago

Thank you for the detailed issue. I'll look into it.

saftacatalinmihai commented 6 years ago

any news on this ?

gabrieldemarmiesse commented 6 years ago

I'm sorry, I took other pull requests in the meatime and forgot about this issue. Thank you for pinging me. I'm looking into it now.

gabrieldemarmiesse commented 6 years ago

I'm not fammiliar with causal padding so I took a look on google and in the code.

For those interested, here is the code in keras:

def conv_output_length(input_length, filter_size,
                       padding, stride, dilation=1):
    """Determines output length of a convolution given input length.
    """
    if input_length is None:
        return None
    assert padding in {'same', 'valid', 'full', 'causal'}
    dilated_filter_size = filter_size + (filter_size - 1) * (dilation - 1)
    if padding == 'same':
        output_length = input_length
    elif padding == 'valid':
        output_length = input_length - dilated_filter_size + 1
    elif padding == 'causal':
        output_length = input_length
    elif padding == 'full':
        output_length = input_length + dilated_filter_size - 1
    return (output_length + stride - 1) // stride

So it seems that only the stride can change the output lenght when using padding="causal".

from the ducumentation: "causal" results in causal (dilated) convolutions, e.g. output[t] does not depend on input[t+1:]. Useful when modeling temporal data where the model should not violate the temporal order. See WaveNet: A Generative Model for Raw Audio, section 2.1.

Looking again at the code in keras for the convolution in 1D:

if padding == 'causal':
    # causal (dilated) convolution:
    left_pad = dilation_rate * (kernel_shape[0] - 1)
    x = temporal_padding(x, (left_pad, 0))
    padding = 'valid'

So keras ensure that we do not depend on earlier values by padding with zeros at the start, then act as if the padding was in mode valid.

It makes sense, and in the end, this is true that the documentation didn't say that keras implemented this shift of the output values by using zero padding.

To get the behavior that was given in @chausies 's message, you just need to use padding='valid' instead of padding='causal', but then, it's your job to keep track of the shift, which is (in my opinion) more complicated than to let keras pad your tensor.

So in the end, the documentation isn't clear, and it caused the confusion. If someone wants to do a quick PR to mention the zero padding in the docs of causal, i'd be great. I'm on holidays, so I can't do it now.

If something isn't clear, feel free to ask.

emerrf commented 6 years ago

I was looking into this today by chance and comparing it with the WaveNet implementation from: https://github.com/munich-ai-labs/keras2-wavenet/blob/master/wavenet_utils.py#L29 I have the impression that the Munich AI labs developers had the same issue and they overridden compute_output_shape. Maybe @imdatsolak has more details. HTH

gabrieldemarmiesse commented 6 years ago

Their implementation is the same as the build in causal keras implementation judging from the code. It seems they were just not aware that causal already existed in keras.

abderrahim commented 6 years ago

I think causal should not be a value of padding, but a separate option that can be used with both padding='valid' and padding='same' (and padding='full' I guess, I don't know this option).

What keras does now with padding='causal' is causal convolution with padding='same' and what @chausies is asking for is causal convolution with padding='valid'.

I propose adding a new causal option, and making padding='causal' a deprecated equivalent to causal=True, padding='same'. If there is no objection to this, I'll try to implement it.

gabrieldemarmiesse commented 6 years ago

Can you give a minimal example of Conv1D(causal=True, padding='valid') in the way you think about it? What would be the difference with Conv1D(causal=False, padding='valid')? Can you give an example with NumPy arrays so that it's explicit?

abderrahim commented 6 years ago

@gabrieldemarmiesse thanks for making me think more deeply about this. You are indeed right. It is a documentation issue.

gabrieldemarmiesse commented 6 years ago

Yould you have the time to do a PR on the docs to mention the zero padding?

wt-huang commented 5 years ago

Closing as this is resolved

rjpg commented 5 years ago

hello,

I am trying to understand the temporal relation with 'causal' padding ...

My goal is to implement in 2D ... for Conv2D assuming the x-axis is the "time steps" and the y-axis are several variables...

for example,assuming K.set_image_dim_ordering("th"), the input shape of CNN2D is (batch,filters, x, y)

and we have (none, 1 , 128,9) means that we have 128 time steps and 9 variables in a multivariate time series problem.

How hard it would be to create or include (with a new extended conv2D) the 'causal' padding in conv2D ?

we could also choose if the "time dimension" is vertical in x or horizontal in y...

For the other 'variables dimension', we could choose 'valid' or 'same' ... If someone is thinking of giving a simple example assume 'valid' padding for this variables dimension since it seems to be the one used inside 'causal' padding ....