hx173149 / C3D-tensorflow

C3D is a modified version of BVLC tensorflow to support 3D ConvNets.
MIT License
588 stars 265 forks source link

Before about the reshape of dimension transform issues #108

Open LiShuiYu opened 5 years ago

LiShuiYu commented 5 years ago

I found that you performed a dimensional transformation before the feature stretching, changing the original [batch, frames, width, high, channels] into [batch, frames, channels, width, high]. Through my study of 3DCNN, I could not understand the significance of this step, so why not write it in the form of [batch, channels, frames, width, high]? Looking forward to your reply. Thank you very much.

rocksyne commented 5 years ago

@LiShuiYu There is no significance. It is solely dependent the library. For example, if you use tensor-flow, this is the method implementation of conv3d

tf.nn.conv3d(
    input,
    filter,
    strides,
    padding,
    data_format='NDHWC',
    dilations=[1, 1, 1, 1, 1],
    name=None
)

The shape of the input parameter must be [batch, in_depth, in_height, in_width, input_channels] REF: https://www.tensorflow.org/api_docs/python/tf/nn/conv3d

However, if you are using theano, the order will be different. It will be (batch size, input_channels, input depth, input rows, input columns) Ref: http://deeplearning.net/software/theano/library/tensor/nnet/conv.html#theano.tensor.nnet.conv3d

So there is no significance in the order. The order is different based on how the library wants you to inpu the sequence of parameters.

Therefore, if you use Keras, you will have to check what backend you are using ( either TF or Theano) else you will run into a lot of trouble. In the Keras Conv3D method, there is a parameter called the data_format. You have to set this parameter to either channel value appears first or last.

keras.layers.Conv3D(filters, kernel_size, strides=(1, 1, 1), padding='valid', data_format=None, dilation_rate=(1, 1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)

The details of this are found here in the Keras documentation: https://keras.io/layers/convolutional/

I hope this clears any confusion. Cheers!