jamesmf / mnistCRNN

Simple TimeDistributed() wrapper Demo in Keras; sums images of MNIST digits
61 stars 24 forks source link

Error of input shape #4

Closed dakshvar22 closed 8 years ago

dakshvar22 commented 8 years ago

Hi i am running your code on Keras 1.0.2 .. I get the an error when I use the following code which is almost similar to yours - model.add(TimeDistributed(Convolution2D(8, 4, 4, border_mode='valid', batch_input_shape=(2710,3,32,32)))) model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2),border_mode='valid'))) model.add(Activation('relu')) model.add(TimeDistributed(Convolution2D(8, 3, 3, border_mode='valid'))) model.add(Activation('relu')) model.add(TimeDistributedFlatten()) model.add(Activation('relu')) model.add(GRU(output_dim=100,return_sequences=True)) model.add(GRU(output_dim=50,return_sequences=False)) model.add(Dropout(.2)) model.add(Dense(1))

I get the following error -

File "/home/toothless/daksh/ml/project/src/model/model2.py", line 133, in buildModel model.add(TimeDistributed(Convolution2D(8, 4, 4, border_mode='valid', input_shape=(2710,3,32,32)))) File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 102, in add raise Exception('The first layer in a Sequential model must ' Exception: The first layer in a Sequential model must get an input_shape or batch_input_shape argument.

I have initialised my model with - model = Sequential() ... I have specified the input shape already, but still it gives the error while building the model itself. I am not even using model.fit right now. Please help me out.

jamesmf commented 8 years ago

I believe your problem is specifying batch_input_shape. That keyword is reserved for when the number of examples per batch is fixed - in that case, the first number (2710) is interpreted as how many examples each batch will contain.

So your model isn't getting enough dimensions specified, as 2710 is interpreted as the number of examples per batch, not the number of timesteps. I assume you want 2710 timesteps? In that case you should change it to input_shape=(2710,3,32,32)

dakshvar22 commented 8 years ago

@jamesmf .. That problem was regarding paranthesis.. I solved it by changing that line to -

model.add(TimeDistributed(Convolution2D(8, 4, 4, border_mode='valid'), input_shape=(2710,3,32,32)))

But, I get the following error now -

File "/home/toothless/daksh/ml/project/src/model/model2.py", line 133, in buildModel model.add(TimeDistributed(Convolution2D(8, 4, 4, border_mode='valid'), input_shape=(2710,3,32,32))) File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 110, in add layer.create_input_layer(batch_input_shape, input_dtype) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 337, in create_input_layer dtype=input_dtype, name=name) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 1026, in Input name=name, input_dtype=dtype) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 957, in init shape = input_tensor._keras_shape AttributeError: 'TensorVariable' object has no attribute '_keras_shape'

Any ideas why this error pops up? I have tried searching on the issues page of the keras project page but havent found any solution yet. Any suggestions please?

jamesmf commented 8 years ago

input_shape should be within the Convolution2D() not within the TimeDistributed(). You've closed the parentheses on the Convolution2D() call before input_shape

the original code looks like:

model.add(TimeDistributed(Convolution2D(8, 4, 4, border_mode='valid', input_shape=(maxToAdd,1,size,size))))

EDIT: this seems to no longer be true, given the example in https://github.com/fchollet/keras/blob/master/keras/layers/wrappers.py

dakshvar22 commented 8 years ago

@jamesmf When I do the way you have showed -

model.add(TimeDistributed(Convolution2D(8, 4, 4, border_mode='valid', input_shape=(10, 3, 299, 299))))

I get the original error which I raised in the issue -

Exception:The first layer in a Sequential model must get an input_shape or batch_input_shape argument.

I am running on Keras 1.0.2 with theano as the backend. Also, I cloned your repo itself and ran the addMNISTrnn.py script ... It gave me the same error as above.

I am confused what to do now.

jamesmf commented 8 years ago

It seems to have broken with 1.0.2, I will look into this as soon as I can.

That said, I get a different error (the Reshape layer isn't working).

dakshvar22 commented 8 years ago

@jamesmf Can you tell the keras version for which this works well?

jamesmf commented 8 years ago

The version before works for 0.3.2, but I have just pushed changes which alter the Reshape layer to work with 1.0.2. Please let me know if you can get the example working with 1.0.2

dakshvar22 commented 8 years ago

@jamesmf Yes, it works now. Can you help me out with my specific problem now. I have 25 videos in my training data. Each video consists of 2721 frames with frame size of 3 x 32 x 32 (channel size = 3). Now I want to feed in one video(all frames at once) into the CNN layer and this CNN layer will connect to my LSTM layer( right now not bidirectional, but later needed).This is why I needed a TimeDistributed version of Convolution 2D. For my Convolution 2D, I am using 32 filters of size 3 x 3. The sequence length for my LSTM would be 2721 only(the number of frames in video). The output of this whole network should be a vector of length 2721. Basically it is some sort of a regression problem.

Right now I have this architecture set up -

` def buildModel():

model.add(TimeDistributed(Convolution2D(32, 3, 3,
                                    border_mode='valid'),batch_input_shape=(1,2721,img_channels, img_rows, img_cols)))
model.add(keras.layers.normalization.BatchNormalization())
model.add(Activation('relu'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(Dropout(0.25))

model.add(TimeDistributed(Convolution2D(32, 3, 3, border_mode='valid')))
model.add(keras.layers.normalization.BatchNormalization())
model.add(Activation('relu'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(Dropout(0.25))

model.add(TimeDistributed(Convolution2D(32, 3, 3, border_mode='valid')))
model.add(keras.layers.normalization.BatchNormalization())
model.add(Activation('relu'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(Dropout(0.25))

model.add(TimeDistributed(Flatten()))
model.add(TimeDistributed(Dense(256,activation='relu')))
model.add(TimeDistributed(Dense(128,activation='relu')))

model.add(LSTM(output_dim=255,return_sequences=True))

model.add(TimeDistributed(Dense(2710,activation='sigmoid')))

sgd = SGD(lr = 0.1, decay = 1e-5, momentum=0.9, nesterov=True)
model.compile(loss='mean_absolute_error',
          optimizer=sgd,
          metrics=['accuracy'])

`

I am having trouble with preparing the input to the network. I have successfully compiled the model. Now I have my Input Training Data(X_train) with dimensions - 25x2721x3x32x32 and the Output Training Data(Y_train) as 25x2721.

When I pass this data to the fit function with the call -

model.fit(X_train,Y_train,batch_size=1,nb_epoch=5,shuffle=True,validation_split=0.2)

I get this error -

TypeError: ('Bad input argument to theano function with name "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py:503" at index 1(0-based)', 'Wrong number of dimensions: expected 3, got 2 with shape (1, 2721).')

I hope the problem is clear. Can you help me out with getting the data arranged for the required problem please?

jamesmf commented 8 years ago

I'm not sure this is the issue, but can you try using train_on_batch() with a single video instead of fit() with batch_size?

Also I'm going to close the issue since the example is working with 1.0.2 (thanks for bringing it to my attention) but I'll keep trying to help. On May 6, 2016 4:48 PM, "Daksh Varshneya" notifications@github.com wrote:

Yes, it works now. Can you help me out with my specific problem now. I have 25 videos in my training data. Each video consists of 2510 frames with frame size of 3 x 32 x 32 (channel size = 3). Now I want to feed in one video(all frames at once) into the CNN layer and this CNN layer will connect to my LSTM layer( right now not bidirectional, but later needed).This is why I needed a TimeDistributed version of Convolution 2D. For my Convolution 2D, I am using 32 filters of size 3 x 3. The sequence length for my LSTM would be 2510 only(the number of frames in video). The output of this whole network should be a vector of length 2510. Basically it is some sort of a regression problem.

Right now I have this architecture set up -

` def buildModel():

model.add(TimeDistributed(Convolution2D(32, 3, 3, border_mode='valid'),batch_input_shape=(1,2710,img_channels, img_rows, img_cols))) model.add(keras.layers.normalization.BatchNormalization()) model.add(Activation('relu')) model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2)))) model.add(Dropout(0.25))

model.add(TimeDistributed(Convolution2D(32, 3, 3, border_mode='valid'))) model.add(keras.layers.normalization.BatchNormalization()) model.add(Activation('relu')) model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2)))) model.add(Dropout(0.25))

model.add(TimeDistributed(Convolution2D(32, 3, 3, border_mode='valid'))) model.add(keras.layers.normalization.BatchNormalization()) model.add(Activation('relu')) model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2)))) model.add(Dropout(0.25))

model.add(TimeDistributed(Flatten())) model.add(TimeDistributed(Dense(256,activation='relu'))) model.add(TimeDistributed(Dense(128,activation='relu')))

model.add(LSTM(output_dim=255,return_sequences=True))

model.add(TimeDistributed(Dense(2710,activation='sigmoid')))

sgd = SGD(lr = 0.1, decay = 1e-5, momentum=0.9, nesterov=True) model.compile(loss='mean_absolute_error', optimizer=sgd, metrics=['accuracy'])

`

I am having trouble with preparing the input to the network. I have successfully compiled the model. Now I have my Input Training Data(X_train) with dimensions - 25x2710x3x32x32 and the Output Training Data(Y_train) as 25x2710.

When I pass this data to the fit function with the call -

model.fit(X_train,Y_train,batch_size=1,nb_epoch=5,shuffle=True,validation_split=0.2)

I get this error -

TypeError: ('Bad input argument to theano function with name "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py:503" at index 1(0-based)', 'Wrong number of dimensions: expected 3, got 2 with shape (1, 2721).')

I hope the problem is clear. Can you help me out with getting the data arranged for the required problem please?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/jamesmf/mnistCRNN/issues/4#issuecomment-217554780

jamesmf commented 8 years ago

Also is there a reason you're specifying batch_input_shape instead of using input_shape? In my experience that is primarily useful for stateful RNNs (which you might want if you choose to chop your video into sequences, but currently don't have/want).

dakshvar22 commented 8 years ago

@jamesmf Okay, so I changed batch_input_shape to input_shape. Regarding the Bad input argument error I found a similar question here - http://stackoverflow.com/questions/32034231/lasagne-theano-wrong-number-of-dimensions

So, according to the link the problem is with the dimension of Y_train vector which is currently of 25x2721(or batch_sizex2721), but the network is expecting it to be a 3 dimensional vector. Any ideas why?

jamesmf commented 8 years ago

Have you tried reshaping that to 25x2721x1? On May 6, 2016 5:29 PM, "Daksh Varshneya" notifications@github.com wrote:

@jamesmf https://github.com/jamesmf Okay, so I changed batch_input_shape to input_shape. Regarding the Bad input argument error I found a similar question here - http://stackoverflow.com/questions/32034231/lasagne-theano-wrong-number-of-dimensions

So, according to the link the problem is with the dimension of Y_train vector which is currently of 25x2721(or batch_sizex2721), the network is expecting it to be a 3 dimensional vector. Any ideas why?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/jamesmf/mnistCRNN/issues/4#issuecomment-217563218

dakshvar22 commented 8 years ago

Reshaping Y_train did the trick I think. But I run into some sort of memory issue now -

Here is the stacktrace -

Error allocating 2775420 bytes of device memory (out of memory). Driver report 2277376 bytes free and 2147352576 bytes total Traceback (most recent call last): File "/home/toothless/daksh/ml/project/src/model/model2.py", line 139, in model.train_on_batch(X_train[0:1],Y_train[0:1]) File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 493, in train_on_batch class_weight=class_weight) File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1161, in train_on_batch outputs = self.train_function(ins) File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 507, in call return self.function(*inputs) File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 871, in call storage_map=getattr(self.fn, 'storage_map', None)) File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 314, in raise_with_op reraise(exc_type, exc_value, exc_trace) File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 859, in call outputs = self.fn() MemoryError: Error allocating 2775420 bytes of device memory (out of memory). Apply node that caused the error: GpuElemwise{add,no_inplace}(GpuDot22.0, GpuDimShuffle{x,0}.0) Toposort index: 483 Inputs types: [CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, row)] Inputs shapes: [(2721, 255), (1, 255)] Inputs strides: [(255, 1), (0, 1)] Inputs values: ['not shown', 'not shown'] Outputs clients: [[GpuReshape{3}(GpuElemwise{add,no_inplace}.0, TensorConstant{[ -1 2721 255]})]] HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

My Device Memory(RAM) still has around 3GB unallocated when this exception is thrown. Confused again.

jamesmf commented 8 years ago

Not too surprising, that's a very large computation (2721 timesteps is a lot, and at each point). That's why I thought you might want time end up dealing with sequences instead of full videos. You could also try downsampling and not using every frame. You could also subsample windows within each frame. That might help with both memory problems and increasing the size of your training data

dakshvar22 commented 8 years ago

Okay, so say for a particular frame I only want my LSTM layer to look at the neighbouring 100 frames(so 100 sequences) and not all 2721 frames, how would I do that if its possible? I hope am making some sense because it's really late here in India :stuck_out_tongue:

jamesmf commented 8 years ago

Slice the data that way. Instead of 25 examples, take 100-frame samples from each video and use those as your input. I'm not sure how that will effect your output because I don't know your goal. But if there aren't very long-range dependencies in your videos, you should be okay.

You can sample randomly or do every possible 100 frame segment. If you want to further increase the robustness of your data, check out resources on data augmentation in image space.

dakshvar22 commented 8 years ago

@jamesmf So, the final goal is to create a summary video of any given video. So, my training data has a particular score for each frame of each video. This score indicates the importance of that frame in that particular video. That's why I actually said in the beginning that I would be needing bidirectional LSTMs as well. A 100 frame dependency (backward and forward) should be enough, isn't it? I am not sure if the first approach you proposed would be applicable here.

jamesmf commented 8 years ago

I see no problem theoretically with each example being (2xnum_dependent_frames+1 x num_channel x num_row x num_col) and the 101st frame being the frame you care about. Or, since that's inefficient, (2xnum_dependent_frames+k x num_channel x num_row x num_col) where k is the number of frames you want to score at once.

That's just how I imagine you might do it. I'm not sure how large a segment you need to capture the importance of each frame.

dakshvar22 commented 8 years ago

Yup, that seems possible. Although, for starters I downsampled my frames to keeping half of the original frames with their corresponding scores. I seem to have gotten rid of the Memory Error problem now. But again, I run into dimension mis match problems. My last output layer has 1361 neurons, as that is the number of frames we are considering for one video. But as we have reshaped our Y_train to 25 x 1361 x 1 , I get the following error -

ValueError: GpuElemwise. Input dimension mis-match. Input 1 (indices start at 0) has shape[2] == 1, but the output's size on that axis is 1361.

I need to do an appropriate reshaping of my Y_train. Can you help me with that?

jamesmf commented 8 years ago

The Dense() in the last TimeDistributed should be Dense(1). That layer has only reduced your input to num_timesteps x 1361. You want it to reduce to num_timesteps x 1.

TimeDistributed takes care of the time element, so your Dense layer just needs to match your output (1-D)

dakshvar22 commented 8 years ago

Yeah, I just thought so. Finally my training has started. Thanks for all the help. I'll try to fine tune the hyperparameters now. Can I have your emailID as this thread is getting very cluttered now.

dakshvar22 commented 8 years ago

@jamesmf Can you please help me out on this thread? - https://github.com/fchollet/keras/issues/2646 Thanks.