Closed xgirones closed 6 years ago
It's very simple, you just have to put a CNN between the input and recurrent layer, CNTK will be able to automatically broadcast the same CNN (in general, an embedding) for every frame in the sequence.
def flatten(input):
assert (len(input.shape) == 3)
return C.reshape(input, input.shape[0]*input.shape[1]* input.shape[2])
# replace this with your own CNN
def CNN(input):
h = C.layers.Convolution(filter_shape=(3,3), num_filters=16, strides=(1,1)(input)
return flatten(h)
def create_model(input):
h = CNN(input)
h = BiRecurrence(C.layers.LSTM(lstm_dim/2), C.layers.LSTM(lstm_dim/2))(h)
return C.layers.Dense(num_classes)(h)
x = C.sequence.input_variable( shape=co, name="input" )
netoutput = create_model(x) #done
I recommend NOT using place holder, with 2 clear advantages:
Thank you for your answer. I tried your suggestion and now I am getting an error in the CNN function
ValueError: Convolution map tensor must have rank 1 or the same as the input tensor.
If I modify CNN to print its input
def CNN(input):
print(input)
h = C.layers.Convolution(filter_shape=(3,3), num_filters=16, strides=(1,1))(input)
return flatten(h)
This is the layout it reports
*Input('input', [#, ], [24])**
What I am doing wrong? Could it be the definition of the input variable?
x = C.sequence.input_variable( shape=24, name="input" )
It works with dense followed by LSTM, but I do not know if for CNN it should be redefined.
in my sample code, I assume the shape of input is a rank-3 tensor, e.g. image. You will have to modify the CNN function, as well as flatten() to suit your data format.
Thank you for your answer. My input is a list of grayscale images where each image has a different number of rows (the number of columns is always 24). How should I define the CNN function to work with this format?
not possible. You have to rescale all images to have the same width x height x channels
Thanks again for your response. In that case I think this would be a great feature to have in CNTK. One of the reasons I am using LSTM is because the number of rows in the images is not fixed. It should be possible to reshape the input sequence to a tensor compatible with CNN and then reshape it again to a sequence of feature vectors suitable for LSTM.
I get what you mean. So it's not a sequence of frames, but you want to treat an image as a sequence of columns. I don't know your particular need, but the principle still holds, though. You have to preprocess your data appropriately to feed to CNTK trainer.
In my case being forced to supply a fixed layout for CNN defeats the purpose of using LSTM later. So far I am already obtaining good results with LSTM alone but they are costly in terms of processing time. I would have liked to study if a CNN+LSTM model could achieve the same accuracy as Dense+LSTM with less LSTM cells, and if there would be a gain in speed.
In the current framework, to my understanding, a graph requires fixed-sized (of static axes) input in order to analyze the forward and backward passes before training. In your case, each input has a fixed-size of 24. If you want to train end-to-end with CNN+LSTM, you can only apply CNN on columns, or column-wise image patches (n x 24, n is constant).
Yes, I tried doing a 1D convolution on columns only, but I got a cuDNN error complaining about an unsupported operation (I do not remember the error code).
redefine your input as
x = C.sequence.input_variable( shape=(1,1,24), name="input" )
so that the input is a rank-3 tensor with (1 channel x 1 row x 24 columns). Then define convolutional kernel like this
C.layers.Convolution(filter_shape=(1,3), num_filters=16, strides=(1,2))
It's 1-D kernel of size 3 along the column-axis with stride 2.
Thanks, I have just tried it but got a _CUDNN_STATUS_NOTSUPPORTED error. It would be great if a CNTK developer could step in and confirm that what we are trying to do is not supported.
RuntimeError: cuDNN failure 9: CUDNN_STATUS_NOT_SUPPORTED ; GPU=0 ; hostname=HOST; expr=cudnnConvolutionBackwardFilter(*m_cudnn, &C::One, m_inT, ptr(in), m_outT, ptr(srcGrad), *m_conv, m_backFiltAlgo.selectedAlgo, ptr(workspace), workspace.BufferSize(), accumulateGradient ? &C::One : &C::Zero, *m_kernelT, ptr(kernelGrad))
[CALL STACK]
> Microsoft::MSR::CNTK::CudaTimer:: Stop
- Microsoft::MSR::CNTK::CudaTimer:: Stop (x2)
- std::enable_shared_from_this<Microsoft::MSR::CNTK::MatrixBase>:: shared_from_this (x3)
- CNTK::Internal:: UseSparseGradientAggregationInDataParallelSGD
- CNTK:: CreateTrainer
- CNTK::Trainer:: TotalNumberOfUnitsSeen
- CNTK::Trainer:: TrainMinibatch (x2)
- PyInit__cntk_py (x2)
- PyEval_EvalFrameDefault
- Py_CheckFunctionResult
- PyObject_CallFunctionObjArgs
It would be helpful if you post your complete code here. I have implemented this type of model before and I didn't have any problem with CNTK.
Sure, this is the code I used to create the model.
def flatten(input):
#assert (len(input.shape) == 3)
#return C.reshape(input, input.shape[0]*input.shape[1]* input.shape[2])
return C.reshape(input, (-1,))
# replace this with your own CNN
def CNN(input):
h=C.layers.Convolution(filter_shape=(1,3), num_filters=16, strides=(1,2), activation = C.leaky_relu)
return flatten(h)
def create_model( sampler ):
pre_dim = 64
lstm_dim = 128
dense_dim = 96
s0, lbl0 = sampler.generate_samples()
minibatch_size = len(s0)
co = s0[0][0].shape[0] # 24
num_classes = lbl0[0][0].shape[0 ]
x = C.sequence.input_variable( shape=(1,1,co), name="input" )
y = C.sequence.input_variable( shape=num_classes, name="output_1" )
model = C.layers.Sequential([CNN,
C.layers.Dense(pre_dim2, activation = C.leaky_relu),
BiRecurrence(C.layers.LSTM(lstm_dim//2, activation=C.softsign),C.layers.LSTM(lstm_dim//2,activation=C.softsign)),
C.layers.Dense(dense_dim, activation = C.leaky_relu),
C.layers.Dense(num_classes, activation = None)])(x)
return model
And during training I am using the following function
def reshape_minibatch(mb):
return [ np.reshape(x,(-1,1,1,24)) for x in mb]
To convert the original
[ (rows_1, 24), (rows_2, 24), ... (rows_n 24) ]
input data layout to the new one
[ (rows_1, 1, 1, 24), (rows_2, 1, 1, 24), ... (rows_n, 1, 1, 24) ]
two problems:
Thanks a lot! I made the changes you suggested and I have been able to train the model. Now I will run some experiments to see if I can reduce the required capacity of the LSTM layer by incorporating some CNN preprocessing, and hope that 2D convolutions are supported in the future.
Is there any example on how to combine LSTM with CNN for image data?
My input data consist on a list of B Si x 24 arrays, where B is the minibatch size, Si the number of rows of the i-th array in the sequence, and 24 the number of columns. My goal is to predict a label for each column of the images in the sequence.
Using this data layout, I am able to train a simple LSTM only model such as the following
Now I would like to preprocess the images in the minibatch using a CNN stack and feed the output to the LSTM, but I have no idea how to proceed. Can anyone help me?
Thanks in advance.