LSTM different case of sequences, doubts in general and CNN+LSTM network for regression problem

Hello, thanks in advance for your help and for the developers of Keras!

I am working with LSTM networks, actually I am trying to create a CNN+LSTM network that takes as inputs images with 3 channels. I have been reading a lot, but I still have several doubts about how LSTM layers really work, because the results I am obtaining in my experiments are horrible, and this networks are told to give great results. I have read #4149 and #2403 and I have clear my mind enough to know that I have to learn a lot.

First I will told you my task and then enumerate the doubts I have about recurrent layers.

My inputs are images with 3 channels, I have reshape my data set with the following code in order to have sequences in time:

data = scipy.io.loadmat('cnn_1p.mat') X = data_test["imgL"] Y = data_test["target"] X_seq = [] seq_len = 2 for i in range(len(X)-seq_len+1): X_seq.append(X[i:i+seq_len,:,:,:]) X_seq = np.asarray(X_seq) Y_seq = Y[(seq_len-1):10*len(Y),:]

And that gives me structures with the following shapes:

In [173]: X_seq.shape Out[173]: (2126, 2, 3, 10, 8) In [174]: Y_seq.shape Out[174]: (2126, 3)

So if I am right, that means that I have nb_samples=2126 (number of samples), each sample is a sequence of length 2 and each element of that sequence is an image of 3 channels and dimensions 10x8, am I right?

My outputs is a matrix of dimensions (nb_samples, 3), so each input image has associated 3 numbers as outputs. What I want is to feed my net with my input sequences of image so each sequence has the image for t-1 and t, I want my net to give me as output the 3 numbers associated with the image at time t. I have read a lot about problems where the sequences are t-1,t and the output is t+1, but I want the output that correspond to the last element of my input sequence.

Having this in mind, I don't know if that is what I am doing with this net:

model = Sequential() model.add(TimeDistributed(Convolution2D(40, 3, 3, border_mode='valid', activation='relu'), input_shape=(seq_len,3, 10, 8))) model.add(TimeDistributed(Dropout(0.2))) model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2)))) model.add(TimeDistributed(Convolution2D(20, 2, 2, border_mode='valid', activation='relu'))) model.add(TimeDistributed(Dropout(0.2))) model.add(TimeDistributed(Flatten())) model.add(LSTM(30, return_sequences = True)) model.add(Dropout(0.2)) model.add(LSTM(15)) model.add(Dropout(0.2)) model.add(Dense(3, init='uniform')) model.compile(optimizer='adam', loss='mse')

So as long as I knew, with TimeDistributed I make sure that the convolutional layer is applayed to each element in the sequence separated. And I added return_sequences = True in the first LSTM layer to connect with the second LSTM layer. Finally the Dense layer is to obtain the 3 outputs I need.

And here comes my doubts (generals and about my problem):

With this network, am I calculating the output of the last element of the sequences or the output of the future instant t+1?
Are my data reshape right?
I don't really understand all the parameters the LSTM and recurrent layers have (I have read the keras documentation but it is not clear to me). Moreover, I don't understand the difference between the cases in this image:

I don't understand the difference and I don't know how can I programm the layer to obtain the different cases.

I have read that is recommended to use reshape instead of flatten() to connect the cnn layer with the lstm layer, but to me, the reshape is not working.
Am I using well the TimeDistributed layer? I have read this : https://github.com/fchollet/keras/blob/master/examples/imdb_cnn_lstm.py I know is 1D instead of 2D, but in that example they don't use TimeDistributed nor flatten() layer.

I think that's all for the moment. Sorry about the big post, and hope some of you could help me.

Hello,

5 Yes

4 You are using TimeDistributed(Flatten()), so this is correct.

3 Currently your architecture is many-to-one. You seem to be on the right track. By playing a little with the flag return_sequence and layer Repeat you can probably get to comprehend the rest of the architectures on your own.

2 Y_seq = Y[(seq_len-1):10*len(Y),:] is confusing use Y_seq = Y[(seq_len-1):,:]

1 You are calculating the output of the last element of the sequence. (More exactly you are calculating an output based on the information contained on all elements of your sequence)

The fact that your seq_length=2 is confusing because LSTM use all information before t, not just t-1 and t (which is true in your case because seq_length=2). If you just want to use only information t-1 and t then use masked convolutions.

Because your seq_length=2 you are not really using the power of LSTM, and you maybe missing the point. Finally many-to-one architecture are harder to train because that is fundamentally a harder problem, because the network doesn't know in what timestep is the relevant information, and the network has to save this relevant information in its internal state, so its not easy, specially with layer sizes so small.

Hi, thanks for replying!

About point 3, how do you know my problem is many-to-one? I think I would like to have a one-to-one network. Anyway I will check the Repeat layer you told me.

In the case I would need the many-to-one architecture, I should increase the seq_len and the numbers of blocks in the LSTM layer, is it? The fact is that for my problem I can't have a seq_len too large, the ideal was to have seq_len=2, so you recommend me to use only convolutions layers? I was very stucked because I have read about the amazing results of the CNN+LSTM, so I thought it could work for my problem.

Thanks about point 2!

Hello, Your problem take a sequence of seq_length (=2 =many) as input, and produce a single (=one) output. If you want to have a one-to-one network, just concatenate( Merge(mode="concat") ) your input sequence into a vector of 2 times the dim.

Layers should have usually between 100 and 1000 units. One rule of thumb is that the amount of memory that your network has (and therefore its representation power) is proportional to its number of parameters. (To count the number of parameters (use model.summary() )

You will have to find what works best for your problem, but LSTM usually take longer to train than CNN and the main feature which is "allow sequence of arbitrary length" is irrelevant in your case.

Also you don't seem to have a lot of data, so SVM would probably work better.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.

keras-team / keras

LSTM different case of sequences, doubts in general and CNN+LSTM network for regression problem #5425