microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.5k stars 4.28k forks source link

SequentialConvolution Example #3543

Open KRcpl88 opened 5 years ago

KRcpl88 commented 5 years ago

I'm trying to build a binary classifier using a 1 dimensional CNN on a sequence of n-dimensional vectors. I'm trying to combine this with an LSTM to build a hybrid LSTM/CNN model.

It should be straightforward to do this using SequentialConvolution , but I don't see any clear examples of this. I want to apply the convolution using a window size of 4, so for example I want to look at the last 4 vectors in the input sequence, and build a convolution on that window for each step in the sequence. I also want to treat each input as a multi-channel vector. In other words, I'm not trying to turn this input into an 4 x n 2D image in 1 channel, and create a window in the 2D image. I want to treat it as a 1D sequence of n dimensional vectors, all 21 input features are evaluated as 21 separate feature channels for each "pixel"

I'm pretty sure this is the right way to do this, in this example n=21

X = C.sequence.input_variable(shape=(21))
Y = C.input_variable(shape=(2))

def create_model(x):    
    with C.layers.default_options(initial_state=0):
        ################################ convolution layer
        x = C.layers.SequentialConvolution(3, 24, pad=True, activation=C.tanh)(x)
        ################################ LSTM layer
        x = C.layers.Stabilizer()(x)
        x = C.layers.LayerNormalization()(x)
        x = C.layers.Recurrence(C.layers.LSTM(shape=21))(x)
        ################################ attention layer
        a = C.layers.Dense(shape=n_hidden, activation=C.relu)(x)
        a = C.layers.Dense(shape=1, activation=C.softmax)(a)
        z = C.layers.Fold(C.plus)(C.element_times(a, x))
        ################################ reduction layer
        z = C.layers.Dense(shape=11, activation=C.relu)(z)
        z = C.layers.Dense(shape=labels_len, activation=None)(z)
    return z

Z = create_model(X)

When I try this model, I get an input vector the the convolution unit which is [#,] (21,). The convolution unit has W = (24, 21, 3) and b=(24,) and then output feature map is [#,] (24,), which is exactly what I would expect for this sequence and these dimensions. image

But, when I try to train the model, I get a warning for EVERY mini-batch (using GPU)

WARNING: Detected asymmetric padding issue with even kernel size and lowerPad (11) < higherPad (12) (i=0), cuDNN will not be able to produce correct result. Switch to reference engine (VERY SLOW).

I'm not sure if something is wrong with how I am defining the model or if it is some other issue. I also tried pad=False, which does make the warning go away, but then Python crashes with no indication of why after a few mini-batches.

delzac commented 5 years ago

First you get an asymmetric padding warning because cuDNN only supports symmetric padding and that only happens when filter_shape that is odd.

So the warning will go away when you decide not to have padding, or when you have padding but filter_shape is odd. i.e. warning will only occur if you pad and filter_shape is even.

For your first seqconv, the filter_shape should be a tuple where the first element refers to size in the sequence axis, and the second element onwards is just (spatial dims).

delzac commented 5 years ago

Anyway, this might be an example you are looking for.