dimension of the input - Githubissues

Hi,

Input dimension on an LTC is (batch size, sequence length, input features). (10000, 1, 384, 640, 3) would run the RNN for 1 timestep, i.e., a single image, whereas (10000, 6, 384, 640, 3) for 6 timestep, i.e.,a video.

Keep in mind that LTC does not support multi-dimensional inputs, i.e, you need to apply a flattening or pooling to make the input a 1D input vector. If you have image data, which sounds like in your case, I suggest applying a few convolutional layers before

height, width, channels = (384, 640, 3)

model = keras.models.Sequential(
    [
        keras.layers.InputLayer(input_shape=(None, height, width, channels)),
        keras.layers.TimeDistributed(
            keras.layers.Conv2D(32, (5, 5), activation="relu")
        ),
        keras.layers.TimeDistributed(keras.layers.MaxPool2D()),
        keras.layers.TimeDistributed(
            keras.layers.Conv2D(64, (5, 5), activation="relu")
        ),
        keras.layers.TimeDistributed(keras.layers.MaxPool2D()),
        keras.layers.TimeDistributed(keras.layers.Flatten()),
        keras.layers.TimeDistributed(keras.layers.Dense(32, activation="relu")),
        keras.layers.RNN(ncp_cell, return_sequences=True),
        keras.layers.TimeDistributed(keras.layers.Activation("softmax")),
    ]
)
model.compile(
    optimizer=keras.optimizers.Adam(0.01),
    loss='sparse_categorical_crossentropy',
)

The data should now be of shape (10000,6,384, 640, 3)

If you need to make one prediction per video, then set return_sequences=False and remove the last TimeDistributed. If you need a prediction at each frame, then let return_sequences=True.

mlech26l / ncps

dimension of the input #22