mlech26l / ncps

PyTorch and TensorFlow implementation of NCP, LTC, and CfC wired neural models
https://www.nature.com/articles/s42256-020-00237-3
Apache License 2.0
1.86k stars 297 forks source link

dimension of the input #22

Closed ChenK19 closed 2 years ago

ChenK19 commented 2 years ago

Hi,

We'are follwoing your work, and now, i have a question that how to reshape our image sequence with dimention of (10000, 384, 640, 3) to fit the input requirement of ltc model. Are (10000, 1, 384, 640, 3) and (10000, 6, 384, 640, 3) same as the input data?

thanks for your reply.

mlech26l commented 2 years ago

Hi,

Input dimension on an LTC is (batch size, sequence length, input features). (10000, 1, 384, 640, 3) would run the RNN for 1 timestep, i.e., a single image, whereas (10000, 6, 384, 640, 3) for 6 timestep, i.e.,a video.

Keep in mind that LTC does not support multi-dimensional inputs, i.e, you need to apply a flattening or pooling to make the input a 1D input vector. If you have image data, which sounds like in your case, I suggest applying a few convolutional layers before

height, width, channels = (384, 640, 3)

model = keras.models.Sequential(
    [
        keras.layers.InputLayer(input_shape=(None, height, width, channels)),
        keras.layers.TimeDistributed(
            keras.layers.Conv2D(32, (5, 5), activation="relu")
        ),
        keras.layers.TimeDistributed(keras.layers.MaxPool2D()),
        keras.layers.TimeDistributed(
            keras.layers.Conv2D(64, (5, 5), activation="relu")
        ),
        keras.layers.TimeDistributed(keras.layers.MaxPool2D()),
        keras.layers.TimeDistributed(keras.layers.Flatten()),
        keras.layers.TimeDistributed(keras.layers.Dense(32, activation="relu")),
        keras.layers.RNN(ncp_cell, return_sequences=True),
        keras.layers.TimeDistributed(keras.layers.Activation("softmax")),
    ]
)
model.compile(
    optimizer=keras.optimizers.Adam(0.01),
    loss='sparse_categorical_crossentropy',
)

The data should now be of shape (10000,6,384, 640, 3)

If you need to make one prediction per video, then set return_sequences=False and remove the last TimeDistributed. If you need a prediction at each frame, then let return_sequences=True.