r4ghu / iOS-CoreML-Yolo

Almost Real-time Object Detection using Apple's CoreML and YOLO v1 -
https://goo.gl/8eVso5
163 stars 25 forks source link

Explanation for permutation op #6

Closed HarshdeepGupta closed 6 years ago

HarshdeepGupta commented 6 years ago

In the model yoloP1P2P3, there is the following layer

model.add(Permute((2,3,1)))

Though it gives correct results, I want to ask about the specific order of (2,3,1). In my understanding, we are switching from Height*Width*Depth convention to Depth*Height*Width , and in that case, shouldn't the tuple be (3,1,2)?

Where is my understanding incorrect?

r4ghu commented 6 years ago

In Keras (with TensorFlow as backend) the default channel ordering is channels_last [NxHxWxD], but the CoreML's default channel ordering is [NxDxHxW]. According to my understanding, CoreML has been developed with inspiration from Caffe. Hence, when you use coremltools.converters, the CoreML team has implemented the conversion functions for the major ops such as Conv2D, MaxPool2D etc., which are being used in majority of DNNs but for some minor ops such as reshape, permute, split etc., they expect the developers to use either one of the libraries that support the format NxDxHxW or their very own Neural Network Builder.

The moment when the model is getting converted from Keras to CoreML, the layer before the permute layer you mentioned above will be turned from channels_last to channels_first. Hence, all the ops that come after this layer including the permute layer have to be designed keeping in mind the channels_first approach.

HarshdeepGupta commented 6 years ago

Thanks for the explanation. So as far as I understood, we are changing the ordering from D*H*W to H*W*D, because coreML automatically flips the channel ordering for convolution layers.