keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.85k stars 19.44k forks source link

CNN to manage sequences of grayscale images like a video files #7927

Closed rcrespocano closed 7 years ago

rcrespocano commented 7 years ago

Hello,

I'm in trouble because I'm not able to manage correctly the CNN and maybe the LSTM-RNN.

I'm trying to solve the following problem. I've 20000 chunks of data that each of them are composed by a sequence of 20 images. The size of each image is 40x40. The images are in grayscale mode. For each chunk of data, we have two possible labels, 0 or 1. We have to classify each chunk of data in two possible sets: 0 or 1. As they are numpy arrays, the shape of each one is:

Well, I want to train a CNN to predict if the label of a chunk of data is 0 or 1. Remember that each chunk of data is composed by 20 images of 40x40 in grayscale. And the most important thing is that the order of that 20 images matter. I mean, they have to be managed as a 'little' video file of 20 images. If we change the order of that 20 images in a chunk of data the label should be different.

I don't know if is mandatory to use a RNN. I've read some papers where the researchers have used CNN without LSTM and with LSTM for trying to solve similar problems. I think that it isn't mandatory but I think that I've to use the combination of CNN with TimeDistributed. Am I right?

Thank you so much in advance.

Kind regards, Rubén

ahrnbom commented 7 years ago

I don't think this question should be a Keras Issue, as it has nothing to do with the development of Keras as a library.

That said, I think convolutional LSTM (which exists in Keras, although I am not sure about the implementation, see this Issue https://github.com/fchollet/keras/issues/7918). It works like a normal convolutional layer, allowing it to effectively learn spatiotemporal things. You can also use 3D convolutions, which answers the question of if it is mandatory to use an RNN (it is not), but RNNs are designed to model the learning of long temporal sequences, which sound like what you're describing. A common implementation is to first work with convolutional layers (that work frame-by-frame, using TimeDistributed) and then once you've scaled the spatial dimensions down to something very small, use a TimeDistributed Flatten layer, followed by a normal (non-spatial) LSTM. I think this way of thinking is a bit flawed, but depending on the data it might work well. This way of modelling assumes that you can get a compact representation of the video frames for each frame independently, and then see the sequence of that compact data and draw temporal conclusions from that. That works in some cases and not in others.

Hope this helps you somewhat.

rcrespocano commented 7 years ago

Thank you for your reply @ahrnbom

I have exposed my problem in a general way and not related with the Keras library due to my lack of my knowledge, sorry.

Talking about the Keras API, I have the following problem. I'm getting the following error:

Traceback (most recent call last): File "/home/rcc/.local/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 654, in _call_cpp_shape_fn_impl input_tensors_as_shapes, status) File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__ next(self.gen) File "/home/rcc/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Negative dimension size caused by subtracting 3 from 1 for 'time_distributed_1/convolution' (op: 'Conv2D') with input shapes: [?,1,40,40], [3,3,40,40].

This is my "toying" code: `import keras from keras.layers import Dense, Dropout, LSTM from keras.layers import Conv2D, Flatten from keras.models import Sequential from keras.layers.wrappers import TimeDistributed from keras.models import Model import numpy as np

number_of_samples = 2500 number_of_test_samples = 2000 timesteps = 20 frame_row = 40 frame_col = 40 channels = 1 output_label_size = 1 epochs = 1 batch_size = 32

data = np.random.random((number_of_samples, timesteps, channels, frame_row, frame_col)) label = np.random.random((number_of_samples, timesteps, output_label_size))

X_train = data[0:number_of_test_samples,:] y_train = label[0:number_of_test_samples] X_test = data[number_of_test_samples:,:] y_test = label[number_of_test_samples:,:]

input_shape = (timesteps, channels, frame_row, frame_col)

print('Input shape: ', input_shape) print('X_train shape: ', X_train.shape) print('y_train shape: ', y_train.shape) print('X_test shape: ', X_test.shape) print('y_test shape: ', y_test.shape)

model = Sequential() model.add(TimeDistributed(Conv2D(40, (3, 3), activation='relu'), input_shape=input_shape)) model.add(TimeDistributed(Dropout(0.2))) model.add(TimeDistributed(Conv2D(20, (3, 3), activation='relu'))) model.add(TimeDistributed(Dropout(0.2))) model.add(TimeDistributed(Flatten())) model.add(LSTM(30, return_sequences = True)) model.add(Dropout(0.2)) model.add(LSTM(15)) model.add(Dropout(0.2)) model.add(Dense(output_label_size, init='uniform')) model.compile(optimizer='adam', loss='mse')`

ahrnbom commented 7 years ago

My point was that the Keras Issue tracker is meant to track issues with the Keras library itself. If you cannot get something up and running in Keras, that is not a problem with the Keras library, but rather a problem with your lack of information. That is something you should bring up somewhere else, like a forum.

Zaibali9999 commented 2 years ago

model = tf.keras.models.Sequential([ Conv2D(16, (3, 3), activation='relu', input_shape=(img_height, img_width, 2)), #filter , kernal_size , activation, input_Size MaxPooling2D(2, 2),

Conv2D(32, (3, 3), activation='relu'),
MaxPooling2D(2, 2),

Conv2D(64, (3, 3), activation='relu'),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D(2, 2),

Conv2D(128, (3, 3), activation='relu'),
Conv2D(128, (3, 3), activation='relu'),
MaxPooling2D(2, 2),

Conv2D(256, (3, 3), activation='relu'),
Conv2D(256, (3, 3), activation='relu'),
Conv2D(256, (3, 3), activation='relu'),
MaxPooling2D(2, 2),

Flatten(),
Dense(512, activation='relu'),
Dense(512, activation='relu'),
Dense(3, activation='softmax')

]) model.summary() is this model is right for greyscale images