microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.53k stars 4.28k forks source link

LSTM and Convolution #2272

Open Worldexe opened 7 years ago

Worldexe commented 7 years ago

Cant make it work. I have sequence of vectors size 240 as features and one label per sequence; so this is sequence-to-one. Simple LSTM setup with BS.Sequences.Last and dynamic axis works OK. But, I want to make convolution on each sample before feeding it into LSTM. My model looks like this:

        t = DynamicAxis()

        featuresCount = $featuresCountC$        
        features = Input(featuresCount)     
        label = Input(1, dynamicAxis = t)

        model = Sequential(         
            ConvolutionalLayer {8, (240:1), pad = false} :
            RecurrentLSTMLayer {7, goBackwards=false, allowOptimizedEngine = false} :
            BS.Sequences.Last
        )
        result = model(features)        
        resultP = ReconcileDynamicAxis(result, label)
        errs = SquareError(label, resultP)

I get this validation error:

Validating network. 39 nodes to process in pass 1.

Validating --> label = InputValue() :  -> [1 x t]
Validating --> model.arrayOfFunctions[1].lstm.B = LearnableParameter() :  -> [28]
Validating --> model.arrayOfFunctions[1].lstm.W = LearnableParameter() :  -> [28 x 0]
Validating --> model.arrayOfFunctions[0].W = LearnableParameter() :  -> [240 x 1 x 0 x 8]
Validating --> features = InputValue() :  -> [240 x *]
Validating --> result.x.x.c = Convolution (model.arrayOfFunctions[0].W, features) : [240 x 1 x 0 x 8], [240 x *] -> [] FAILED

...

EXCEPTION occurred: Convolution input and kernel tensors must have the same rank.

Why am I getting that [28x0] in dimensions? What am I doing wrong?

n17s commented 7 years ago

Hi you are saying that your vectors are of size 240 and you are using a convolution of size 240. This has the effect of a dense layer. That's probably not what you want. If what you has is to convolve over the sequence axis then what you can do is splice together the pastValue of the input, the input, and its futureValue and feed that to a dense layer. That will have the same effect as a convolution with a window size of 3. If that's not what you want, please elaborate below.

Worldexe commented 7 years ago

Well, that could be the next step. Right now I want to make this simple (and well, maybe not very useful) setup work. Actually, there should be more convolution/pools in the stack.

n17s commented 7 years ago

To address your original problem, it might be helpful to call Flatten after convolution.

Worldexe commented 7 years ago

Flatten did not help. I had to remove reductionRank from ConvolutionLayer to make it work:

ConvolutionalLayerFixed {
    numOutputChannels,   # e.g. (1) or BS.Constants.None
    filterShape,         # e.g. (3:3)
    bias = true,
    activation = (x=>x),
    init = 'glorotUniform',
    initValueScale = 1,          # TODO: rename to initScale
    initBias = 0,
    #reductionRank = 1,          # TODO: support this
    stride = 1, pad = false,
    lowerPad = 0, upperPad = 0,
    maxTempMemSizeInSamples = 0
} = {
    outputChannelsShape = _AsArray (numOutputChannels)
    filterRank = Length (filterShape)
    W = ParameterTensor{_ConcatArrays (filterShape, outputChannelsShape), init = init, initValueScale = initValueScale, initFilterRank = filterRank, initOutputRank = -1}  # [ W x H x C x K ]
    b = ParameterTensor(_ConcatArrays (Repeat (Length (filterShape), 1), outputChannelsShape), initValue = initBias)                                                       # [ 1 x 1 x     K ]
    sharing = true    # TODO: support this
    apply (x) = {
        c = Convolution (W, x, filterShape, mapDims = numOutputChannels, stride = stride, sharing = sharing, autoPadding = pad, lowerPad = lowerPad, upperPad = upperPad, maxTempMemSizeInSamples = maxTempMemSizeInSamples)
        res = activation (if bias then c + b else c)
    }.res
}.apply

t = DynamicAxis()

featuresCount = $featuresCountC$        
features = Input(featuresCount, dynamicAxis = t)
featuresShaped = NewReshape(features, (featuresCount:1))

featuresConvolved = ConvolutionalLayerFixed{1, (featuresCount:1), pad = false} (featuresShaped)

label = Input(1)

model = Sequential(         
    RecurrentLSTMLayer {1, goBackwards=false, allowOptimizedEngine = false} :
    BS.Sequences.Last
)
result = model(featuresConvolved)

resultP = ReconcileDynamicAxis(result, label)

errs = SquareError(label, resultP)

Still not sure this will not break something.

Worldexe commented 7 years ago

Now I get

cuDNN failure 9: CUDNN_STATUS_NOT_SUPPORTED ; GPU=0 ; hostname=deeplearning4 ; expr=cudnnConvolutionBackwardFilter(*m_cudnn, &C::One, m_inT, ptr(in), m_outT, ptr(srcGrad), *m_conv, m_backFiltAlgo.selectedAlgo, ptr(workspace), workspace.BufferSize(), accumulateGradient ? &C::One : &C::Zero, *m_kernelT, ptr(kernelGrad))

Kinda useful that there are no indications of what parameter value is 'not supported'.