matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Other
24.57k stars 11.69k forks source link

Can someone explain me why time distributed layers are used in Mask R-CNN? #2298

Closed nnnetizen closed 4 years ago

VictorAtPL commented 4 years ago

In keras, when you build a sequential model, the second dimension is related to a time dimension. This means that if your data is of 5-dim with (sample, time, width, length, channel), then you can apply a convolutional layer using TimeDistributed (which is applicable to 4-dim with (sample, width, length, channel)) along a time dimension in order to obtain the 5-d output.

In keras version 2.0, Dense is by default applied to only last dimension (e.g. if you apply Dense(10) to input with shape (n, m, o, p) you'll get output with shape (n, m, o, 10)) that’s why in your case Dense and TimeDistributed(Dense) are equivalents.

Author: Anurag Garg Source: https://intellipaat.com/community/6298/what-is-the-role-of-timedistributed-layer-in-keras?show=6431#a6431

RoIs are for example of shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels], so in order to do Conv2D over second dimension (num_rois), you need to use TimeDistributed layer, I believe.

nnnetizen commented 4 years ago

@VictorAtPL so does that mean that time distributed layer in this case is only used for handling a 5D tensor as though its a 4D tensor

VictorAtPL commented 4 years ago

@nnnetizen Let's omit batch axis for simplicity. In case of Mask RCNN (RoIs), instead of applying convolution on whole 4-D tensor [num_rois, POOL_SIZE, POOL_SIZE, channels] TimeDistributed applies convolution with same weights for all num_rois (3-D tensors). In other words, convolution is applied with same weights on tensors of shape [POOL_SIZE, POOL_SIZE, channels] num_rois-times.

Further read: https://medium.com/smileinnovation/how-to-work-with-time-distributed-data-in-a-neural-network-b8b39aa4ce00

nnnetizen commented 4 years ago

@VictorAtPL Thanks I think I understand now