keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.57k stars 19.42k forks source link

Convolution layer + LSTM layer, input_dim error #5032

Closed Mako18 closed 7 years ago

Mako18 commented 7 years ago

Hi, I'm trying to desing a net combining CNN layers with LSTM layer. I have three inputs that are gray-scale images. First I create a CNN layer to each input so it can learn spatial patrons and after that I merge the three inputs. Now I want my net to learn about time patrons, so I have 1730 samples (each sample is a different time) for each input. The problem is that when I merge it and try to compute the LSTM layer, it gives me the next error:

ValueError: Input 0 is incompatible with layer lstm_6: expected ndim=3, found ndim=4

My inputs shape are (1,8,10) (80 pixels images). My code is:

Izd=Sequential() Izd.add(Convolution2D(40,3,3, input_shape=(1,8,10),border_mode='same')) Izd.add(Activation('relu')) Izd.add(MaxPooling2D(pool_size=(2,2))) Izd.add(Dropout(0.2))

Dcha=Sequential() Dcha.add(Convolution2D(40,3,3, input_shape=(1,8,10),border_mode='same')) Dcha.add(Activation('relu')) Dcha.add(MaxPooling2D(pool_size=(2,2)))

Frt=Sequential() Frt.add(Convolution2D(40,3,3,input_shape(1,8,10),border_mode='same')) Frt.add(Activation('relu')) Frt.add(MaxPooling2D(pool_size=(2,2)))

merged=merge([Izda, Dcha,Frt], mode='concat', concat_axis=1)

model=Sequential() model.add(merged) model.add(LSTM(240))

I would like to know if there is a way to create a network with convolutional layer and lstm layer with a 2D input. I have seen examples with words and sentences, but not with images and I don't know what kind of input LSTM expect.

Thanks.

patyork commented 7 years ago

You would need a Flatten() Layer, possibly with the TimeDistributed wrapper; LSTM expects an input of shape (batch_size, timesteps, features) which is 3 dimensions, but you are passing it an input shape of (batch_size, 3*pool_features, nb_row, nb_cols) which comes out to (batch_size, 120, 4, 5) which is of ndim=4.

Mako18 commented 7 years ago

I've try that too and the error in that case is that ndim=2 when LSTM expects ndim=3, that's why I delete the Flatten layer, so I don't know what to do or if it is possible to do that.

patyork commented 7 years ago
from keras.models import Sequential
from keras.layers import Convolution2D, Activation, MaxPooling2D, Dropout, LSTM, Flatten, Merge, TimeDistributed
import numpy as np

# Generate fake data
# Assumed to be 1730 grayscale video frames
x_data = np.random.random((1730, 1, 8, 10))

sequence_lengths = None

Izda=Sequential()
Izda.add(TimeDistributed(Convolution2D(40,3,3,border_mode='same'), input_shape=(sequence_lengths, 1,8,10)))
Izda.add(Activation('relu'))
Izda.add(TimeDistributed(MaxPooling2D(pool_size=(2,2))))
Izda.add(Dropout(0.2))

Dcha=Sequential()
Dcha.add(TimeDistributed(Convolution2D(40,3,3,border_mode='same'), input_shape=(sequence_lengths, 1,8,10)))
Dcha.add(Activation('relu'))
Dcha.add(TimeDistributed(MaxPooling2D(pool_size=(2,2))))
Dcha.add(Dropout(0.2))

Frt=Sequential()
Frt.add(TimeDistributed(Convolution2D(40,3,3,border_mode='same'), input_shape=(sequence_lengths, 1,8,10)))
Frt.add(Activation('relu'))
Frt.add(TimeDistributed(MaxPooling2D(pool_size=(2,2))))
Frt.add(Dropout(0.2))

merged=Merge([Izda, Dcha,Frt], mode='concat', concat_axis=2)
# Output from merge is (batch_size, sequence_length, 120, 4, 5)
# We want to get this down to (batch_size, sequence_length, 120*4*5)

model=Sequential()
model.add(merged)
model.add(TimeDistributed(Flatten()))
model.add(LSTM(240, return_sequences=True))

model.compile(loss='mse', optimizer='adam')
model.summary()

sequence_length is currently set to None, meaning you should be able to feed the network any slice of your long sequence.

Example using the fake data defined in the above script:

# Slice our long, single sequence up into shorter sequeunces of images
# Let's make 50 examples of 15 frame videos
x_train = []
seq_len = 15
for i in range(50):
    x_train.append(x_data[i*5:i*5+seq_len, :, :, :])
x_train = np.asarray(x_train, dtype='float32')
print x_train.shape
# >> (50, 15, 1, 8, 10)

# We need some fake targets now as well
y_train = np.ones((50, 15, 240))

# Fit on this data
# I'm not sure what the 3 separate Conv models are supposed to do, so let's just feed the nput 3 times
model.fit([x_train,x_train,x_train], y_train, batch_size=5, nb_epoch=2, verbose=2)
Epoch 1/2
1s - loss: 0.4043
Epoch 2/2
1s - loss: 0.0308
Mako18 commented 7 years ago

Great, I will try this code and tell you if it works. I have a doubt, I have read de TimeDistributed info and I don't get why sequence_lengths=None. In keras.io is an example of a TimeDistributed layer apply to a Convolution2D with a 3 channels image and it says that input_shape=(10,3,299,299) in TimeDistributed layer. So I wonder why in thi case it isn't sequence_lengths=1730 And another doubt is about conact_axis, I set it to 1, why have you set it to 2? Thanks

Mako18 commented 7 years ago

When I write your code (with sequence_length=None and concataxis=2) I get the next error: `ValueError: Error when checking model input: expected timedistributed input_6 to have 5 dimensions but got array with shape (1730,1,8,10) I obtain the same error if I write sequence_lengths=1730 instead of None and with concat_axis=1, so I don't know where it comes from. It gives me this error when I try to train the net, so I guess is something about my inputs. Here is my trainning line: model.fit([izt,dct,frt], W_dt, nb_epohc = 1000, batch_size = 20, verbose = 1) Where izt, dct and rt are my 3 inputs images, that I have reshaped as follow: izt = izt.reshape(izt.shape[0], 1, 8, 10) Withizt.shape[0] = 1730`, so I put them in image format.

Mako18 commented 7 years ago

It still doesn't work for me. I have try exactly the same example that you, but with my own targets (they have 1730*3 dimensions) and I still have input_shape error, altough a different one.

Also I don't get why you have to split the 1730 into 50 of 15 length, what happend with 1730?

patyork commented 7 years ago

sequence_length can be set to any integer, if you want you model to only accept, for example, a sequence of length 29 or 1730; None allows it to accept any length of sequence. concat_axis=2 instead of 1 becase the TimeDistributed wrapper adds another dimension (a time dimension), so the concat axis becomes offset by 1 more as well.

The model needs a 5 dimensional input of shape (batch_size, seq_length, channels, img_h, img_w); the error your getting is because you are not passing in a batch_size.

I split the long input sequence into smaller sequences, because the model will learn nothing from one example (of length 1730); it will memorize the expected outputs to minimize the error, and will not generalize. If you have only this one sequence/video of 1730 you will either need to split it up into smaller sequences, apply augmentation to get more sequences of length 1730, or forgo trying to apply a time aspect to your model (because you simply won't have enough data).

Here is an example of how to train on your one large example, without splitting it into subsequences. Note: I'm generating completely random data:

# Let's jsut train on our 1 example
# beacuse we only have 1 example, our batch_size will be 1
x_train = x_data.reshape(([1] + list(x_data.shape)))
print x_train.shape, x_train.ndim == 5
# >> (1, 1730, 1, 8, 10) True

# We need some fake targets now as well
y_train = np.ones((1, 1730, 240))

# Fit on this data
# I'm not sure what the 3 separate Conv models are supposed to do, so let's just feed the nput 3 times
model.fit([x_train,x_train,x_train], y_train, batch_size=1, nb_epoch=25, verbose=2)

If I run the above (on completely random data, just the one sample seen once per epoch), the loss goes from 1.05 down to 0.0793; in other words, the model memorized the fake data really well, but it is a useless model that won't generalize, it's just memorized.

Epoch 1/50
3s - loss: 1.0516   # high loss
Epoch 2/50
2s - loss: 0.8490
...
# After seeing the sequence 25 times, training time of 50 seconds
Epoch 24/50
2s - loss: 0.0793
Epoch 25/50
2s - loss: 0.0793  # loss is now quite low, because the model simply memorizes what is expected of it

I highly recommend you read about Recurrent Networks and review the Keras documentation and the examples.

Mako18 commented 7 years ago

Thanks for your answer. I have read Keras documentation and the examples on it, but haven't found help with my problem because I don't get to understand well how the TimeDistributed layer works, I understand the concat_axis issue, I wasn't taking in account the time dimension, thanks. I don't want a useless net, so I will try to understand the sequence split you have made, and try to apply it to my images input. Thank you.

patyork commented 7 years ago

Depending on what you are trying to do, splitting the long sequence up into smaller sequences and training on those might not make sense.

If splitting the sequence up doesn't make sense, you'll need to read more about data augmentation to transform your 1 example into many (similar, but slightly different) examples; as I've said, just one example will not lead to a good generalized network - more data is needed.

patyork commented 7 years ago

Also, there's no guarantee that simply splitting up the sequence into shorter sequences will do any better; there's a good chance that data augmentation would be required there as well. 1730 image frames is simply not very much data to work with.

Mako18 commented 7 years ago

Hi again. I have been reading a lot about this issue and still have several doubts. Let me explain better my problem: I have 3 inputs images per time (gray-scale) and each one has dimensions (1,8,10). I want to introduce at the same time the 3 images for t1, the next three for t2 and so on and I want my network to give me an array with 3 elements per time (so I input 3 images and the network gives me 3 values). I cannot desorder my images, so here comes my first doubt: how should I have my dataset in order to introduce, at the same time as input, 3 images? I have though about making an structure with (1,8,30) and try to specify to the net that each 10 column defines a image, what I don't know if I can do that and how to do it, because I have 1730 examples. Or if this a way to tell the net to take the three images at the same time.

I have read more carefully about your split data, and in my case I guess that I can split my examples so instead of having 1730 examples, I can have 865 examples, where each examples has three images for t1, and three image for t2. What I mean is:

example 1 ----> [(i1,d1,f1), (i2,d2,f2)], where (i1,d1,f1) is the structure with the 3 images for t1 and (i2,d2,f2) the 3 images for t2.

I have read about an example trying to understand the input shape TimeDistributed needs. In this example the girl sais that if she had 1000 phrases with 10 word in each one and each word represented by a 3D vector, she had nb_samples=1000 , time_steps=10 and input_dim=3. In my case, in the example I have said before it would be: nb_samples=865, time_steps=2 and input_dim=(1,8,30)? or maybe input_dim can be another structure that can diferenciate the 3 images.

Thanks very much, I'm trying my best to understand this.

wookaka commented 7 years ago

Are you talking about 3DCNN ?3DCNN can process multiple images at same time. You want to get features of first three different images as first step of lstm, then next features of three images to be second step. Am i right? @Mako18

Mako18 commented 7 years ago

Yes, that's right. The first input of th lstm should be the features (thanks to a convolution layer) of the 3 first images, then the second input of lstm should be the features of the next 3 images and so on. Can you help me?

TheRed002 commented 7 years ago

@Mako18 have you find any solution? I am running into exactly same scenario and totally new to ML. I can't figure out this timeDimension layer, its input, when to use it, when to not, I have seen examples which joins CNN to LSTM without applying timeDistributed layer and there are others who says to apply it. Can somebody explain this more clearly with a simple example defining problem and all it's parameters and to be used and then to code? That would be highly appreciated.

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

HTLife commented 6 years ago

@patyork How to update your example to Keras2 API standard? I'd update almost all function to Keras2. However, I can't successfully modify the Merge function.

/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:34: 
UserWarning: The `Merge` layer is deprecated and will be removed after 08/2017. 
Use instead layers from `keras.layers.merge`, e.g. `add`, `concatenate`, etc.
from keras.models import Sequential
from keras.layers import Activation, MaxPooling2D, Dropout, LSTM, Flatten, Merge, TimeDistributed
import numpy as np

from keras.layers import Concatenate

from keras.layers.convolutional import Conv2D

# Generate fake data
# Assumed to be 1730 grayscale video frames
x_data = np.random.random((1730, 1, 8, 10))

sequence_lengths = None

Izda=Sequential()
Izda.add(TimeDistributed(Conv2D(40,(3,3),padding='same'), input_shape=(sequence_lengths, 1,8,10)))
Izda.add(Activation('relu'))
Izda.add(TimeDistributed(MaxPooling2D(data_format="channels_first", pool_size=(2, 2))))
Izda.add(Dropout(0.2))

Dcha=Sequential()
Dcha.add(TimeDistributed(Conv2D(40,(3,3),padding='same'), input_shape=(sequence_lengths, 1,8,10)))
Dcha.add(Activation('relu'))
Dcha.add(TimeDistributed(MaxPooling2D(data_format="channels_first", pool_size=(2, 2))))
Dcha.add(Dropout(0.2))

Frt=Sequential()
Frt.add(TimeDistributed(Conv2D(40,(3,3),padding='same'), input_shape=(sequence_lengths, 1,8,10)))
Frt.add(Activation('relu'))
Frt.add(TimeDistributed(MaxPooling2D(data_format="channels_first", pool_size=(2, 2))))
Frt.add(Dropout(0.2))

merged=Merge([Izda, Dcha,Frt], mode='concat', concat_axis=2)
#merged=Concatenate()([Izda, Dcha, Frt], axis=2)
# Output from merge is (batch_size, sequence_length, 120, 4, 5)
# We want to get this down to (batch_size, sequence_length, 120*4*5)

model=Sequential()
model.add(merged)
model.add(TimeDistributed(Flatten()))
model.add(LSTM(240, return_sequences=True))

model.compile(loss='mse', optimizer='adam')
model.summary()