toandaominh1997 / EfficientDet.Pytorch

Implementation EfficientDet: Scalable and Efficient Object Detection in PyTorch
MIT License
1.44k stars 305 forks source link

Convolutions don't match paper #124

Open rmcavoy opened 4 years ago

rmcavoy commented 4 years ago

According to the original paper (see quotes below), the convolutions in this implementation appear to be incorrect since among other reasons normal convolutions are used. With regard to the bifpn,

Notably, to further improve the efficiency, we use depthwise separable convolution [4, 29] for feature fusion (in the bifpn), and add batch normalization and activation after each convolution.

Thus we know that in the BIFPN EfficientDet uses depthwise separable convolutions and has batch norm and activation functions following each depthwise separable convolution rather than the normal convolutions used here.

Batch normalization is added after every convolution with batch norm decay 0.997 and epsilon 1e-4.

This directly states the all convolutions have batch norms but without stating whether depthwise separable convolutions are used outside the bifpn. It seems likely to me that all convolutions are depthwise separable because otherwise whichever convolution was not depthwise separable would be the bottleneck.

rmcavoy commented 4 years ago

I went through and confirmed that if depthwise separable convolutions are used everywhere along with fixing the code so it is properly setting Dbifpn, Dclass/box, Wbifpn, Wclass/box etc to the correct values (see the other issues) that the total size of the network including both the backbone and the detector-heads matches the parameters numbers from the paper to a close approximation (you need to neglect the parameters from the final classifier in the backbone since they aren't used in EfficientDet at all). With this update, I am only short between .01 and 0.5 million parameters for each of the models from d0 to d7 instead of massively over for each if using normal convolutions.

rmcavoy commented 4 years ago

Edit: Turns out Figure 3 is just misleading.

sevakon commented 4 years ago

@rmcavoy thank you for your notes. Am I right that you claim that all the convolutions layers in BiPFN and the Box prediction network should be depthwise separable?

rmcavoy commented 4 years ago

@sevakon Yes, from my tests of the number of parameters, every part of the detection network uses depthwise separable convolutions instead of normal convolutions. This makes sense with the original paper using "conv" to denote the depthwise separable convolutions in the BiFPN (as explicitly stated in the paper) and then using "conv" again in the Classification/Regression networks without reclarifying.

rmcavoy commented 4 years ago

Also note that you need to make sure there is a batch norm after every depthwise separable layer to match the original paper.

turner-rovco commented 4 years ago

Just to be totally clear "every part of the detection network" means all of the BiFPN and RetinaHead?

rmcavoy commented 4 years ago

Yes, only the BiFPN and the RetinaHead. The backbone EfficientNet remains unchanged versus what is there currently.

bluesky314 commented 4 years ago

@toandaominh1997 Are you going to respond to any of these issues ever?

dkoguciuk commented 4 years ago

Also, Figure 3 shows each layer of the box/classification network having two (depth-wise separable) convolutions per layer as they surround the two convolutions with a dotted box just as they did the single layer of the bifpn.

@rmcavoy, I have assumed just like you that each layer of the box/classification network has two convs, but according to the original implementation that's not the case.

rmcavoy commented 4 years ago

@dkoguciuk Yep I discovered that as well but forgot to update this post.

dkoguciuk commented 4 years ago

@rmcavoy, don't worry - you've help a lot of people here with your comments :)

zylo117 commented 4 years ago

Yes, only the BiFPN and the RetinaHead. The backbone EfficientNet remains unchanged versus what is there currently.

you are right, this repo changed the conv stride, so the pretrained weights is gone, and performance will be bad. I re-implement efficientdet step-by-step from the official. from d0-d5, the result seems nice.

https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch

turner-rovco commented 4 years ago

@zylo117 Just checked out your repo, it looks promising, thank you! The most important thing is for the repo owner to keep going, working on it, fixing issues etc, I hope you'll stick it out!