Use LSTM for recurrent convolutional network without time distributed wrapper

mdlockyer commented 6 years ago

I am facing an issue with a project I'm currently working on. I am attempting to build a many-to-many model that takes a series of images and classifies them. That part is relatively straight forward. I have a model built using Keras that uses convolutional layers inside the time distributed wrapper that feed into an LSTM and it works fine. The complexity in my current project comes from the fact that this model needs to be converted to CoreML for deployment. I feel I'm up against a wall with this so any help provided would be a life saver.

Like I said previously, my current model is trainable using the time distributed wrapper, but this doesn't seem to be supported by CoreML. I have seen some examples using an LSTM with CoreML where the LSTM states are passed in and out of the model with each item in the sequence. This essentially creates a recurrent network that takes only a single item from the sequence (as well as the previous predictions LSTM states) as an input, rather than the whole sequence at once. That LSTM state loop (for lack of a better term) seems to be the best option as CoreML doesn't support sequential image inputs. My issue then comes from training. How can I train my network properly on sequential data, then convert it to CoreML?

If I remove the time distribution from the non-LSTM layers, the model won't compile because it's missing the extra time dimension. Essentially, the catch here is I can't remove the time distribution wrappers as the model isn't functional without the inclusion of time steps, and I can't convert to CoreML while they are present.

Does anyone have any ideas on how to do this? I hope this question is understandable. It's quite late and I've been working on this for 20+ hours straight so I'm a bit fried at the moment. Thanks in advance for any input, thoughts, or ideas provided. Cheers!

My model:

image_input = Input(shape=(max_sequence_length, 224, 224, 3))

convolutional_1 = TimeDistributed(Conv2D(64, (3, 3), activation='relu', data_format = 'channels_last'))(image_input)
pooling_1 = TimeDistributed(MaxPooling2D((2, 2), strides=(1, 1)(convolutional_1)

convolutional_2 = TimeDistributed(Conv2D(128, (4,4), activation='relu'))(pooling_1)
pooling_2 = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(convolutional_2)

convolutional_3 = TimeDistributed(Conv2D(256, (4,4), activation='relu'))(pooling_2)
pooling_3 = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(convolutional_3)

flatten_1 = TimeDistributed(Flatten())(pooling_3)
dropout_1 = TimeDistributed(Dropout(0.5))(flatten_1)

lstm_1 = LSTM(256, return_sequences=True, return_state=False, stateful=False, dropout=0.5)(dropout_1)

dense_1 = TimeDistributed(Dense(num_classes, activation='sigmoid'))(lstm_1)

model = Model(inputs = image_input, outputs = dense_1)

I wanted to add that I have seen some posts where it seems people were using time distribution wrappers with CoreML, however when I try to convert my model it raises this error as soon as it hits the first wrapper:

"AttributeError: The layer has never been called and thus has no defined output shape."

I have modified the conversion script for Keras -> CoreML to handle a 4D input (although I haven't been able to test it to see if it works as expected as I can't convert my model) for the image sequence, so if I can get it to convert with the time distribution layers in place, it would be functional.

Link to an Apple article discussing RNN's in CoreML

Link to a GitHub repo with an implementation of an LSTM RNN

fchollet commented 6 years ago

If your sequences have a fixed length, it is possible to implement TimeDistributed with a Python loop, without using custom layers, nor TimeDistributed, nor Lambda (thus it will be compatible with CoreML).

from keras import layers

convnet = ...  # your base conv model
timestep_inputs = [layers.Input(...) for _ in range(num_timesteps)]
conv_outputs = []
for x in timestep_inputs:
   y = convnet(x)
   conv_outputs.append(y)
x = layers.concatenate(conv_outputs, axis=1)
y = layers.LSTM(...)(x)

model = Model(timestep_inputs, y)

You could greatly simply this by having a Lambda layer that takes a single input tensor and decomposes it into n timesteps.

mdlockyer commented 6 years ago

Thank you very much for the information! Very helpful.

any idea why this is giving me the wrong dimensions for the LSTM layer? When I build it I get:

ValueError: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=2

conv_input = Input(shape=(224, 224, 3))

convolutional_1 = Conv2D(64, (3, 3), activation='relu', data_format = 'channels_last')(conv_input)
pooling_1 = MaxPooling2D((2, 2), strides=(1, 1))(convolutional_1)

convolutional_2 = Conv2D(128, (4,4), activation='relu')(pooling_1)
pooling_2 = MaxPooling2D((2, 2), strides=(2, 2))(convolutional_2)

convolutional_3 = Conv2D(256, (4,4), activation='relu')(pooling_2)
pooling_3 = MaxPooling2D((2, 2), strides=(2, 2))(convolutional_3)

flatten_1 = Flatten()(pooling_3)
dropout_1 = Dropout(0.5)(flatten_1)

convnet = Model(inputs = conv_input, outputs = dropout_1)

image_input = Input(shape=(224, 224, 3))
timestep_inputs = [image_input for _ in range(num_timesteps)]
conv_outputs = []
for x in timestep_inputs:
   y = convnet(x)
   conv_outputs.append(y)
   x = concatenate(conv_outputs, axis = 1)
   y = LSTM(64, return_sequences=True, return_state=False, stateful=False, dropout=0.5)(x)

model_block_1 = Model(inputs = timestep_inputs, outputs = y)

It looks like it should work! maybe an issue with the concat?

StevenLOL commented 6 years ago

Hi, @fchollet is there a way to join the output of TimeDistributed flatten layer? eg from [None, None, 100] to [None,1000] if number of time-stamp =10

teaglin commented 6 years ago

I'm getting the same issue as Michael, @fchollet your solution doesn't work. Are there any other options?

OriAlpha commented 4 years ago

Thank you very much for the information! Very helpful.

any idea why this is giving me the wrong dimensions for the LSTM layer? When I build it I get:

ValueError: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=2

conv_input = Input(shape=(224, 224, 3))

convolutional_1 = Conv2D(64, (3, 3), activation='relu', data_format = 'channels_last')(conv_input)
pooling_1 = MaxPooling2D((2, 2), strides=(1, 1))(convolutional_1)

convolutional_2 = Conv2D(128, (4,4), activation='relu')(pooling_1)
pooling_2 = MaxPooling2D((2, 2), strides=(2, 2))(convolutional_2)

convolutional_3 = Conv2D(256, (4,4), activation='relu')(pooling_2)
pooling_3 = MaxPooling2D((2, 2), strides=(2, 2))(convolutional_3)

flatten_1 = Flatten()(pooling_3)
dropout_1 = Dropout(0.5)(flatten_1)

convnet = Model(inputs = conv_input, outputs = dropout_1)

image_input = Input(shape=(224, 224, 3))
timestep_inputs = [image_input for _ in range(num_timesteps)]
conv_outputs = []
for x in timestep_inputs:
   y = convnet(x)
   conv_outputs.append(y)
   x = concatenate(conv_outputs, axis = 1)
   y = LSTM(64, return_sequences=True, return_state=False, stateful=False, dropout=0.5)(x)

model_block_1 = Model(inputs = timestep_inputs, outputs = y)

It looks like it should work! maybe an issue with the concat?

Its because LSTM takes 3 dimensions and you need to reshape the final layer in the base model instead of flattening layer

keras-team / keras

Use LSTM for recurrent convolutional network without time distributed wrapper #9232