Implementation differences compared to Caffe

ijkguo / mx-rcnn

Parallel Faster R-CNN implementation with MXNet.

Other

671 stars 290 forks source link

Implementation differences compared to Caffe #59

Closed nirbenz closed 7 years ago

nirbenz commented 7 years ago

This isn't an issue per-se; rather, I'm curious about differences from the Caffe implementation and why they were made;

Filtering of region proposals is done by confidence first, then by NMS. Caffe has this the other way around.
While benchmarking both implementations over randomly-generated images I noticed that this implementation constantly creates 300 region proposals at the output, while Caffe implementation actually outputs less (around 100+/-10 for those random images).
For a model converted with the included converter there are large differences in outputs (despite, surprisingly, very very similar mAPs).

Curious to hear the authors idea about those issues/questions.

Thanks!

ijkguo commented 7 years ago

I don't think so. They filter proposals by confidence first as well.
MXNet cannot reshape on the fly which affects performance. Extra are padded.
It has something to do with different pooling implementation.

nirbenz commented 7 years ago

Thanks for your quick reply!

You're right, my bad (it is performed in that order in some places of the code though which is strange by itself).
The 'random images' are all of the same resolution so I don't see how that MXNet limitation would affect results. 2.5. Not sure I understand what you mean by padding, though. I did notice that for a new resolution size the MutableModule class is recreated to accommodate the new resolution.
Interesting. Could you share more?

Thanks!

ijkguo commented 7 years ago

2.5. I set the output proposal number to a fixed number. It can be changed but it is not.

Refer to mxnet docs on sym.Pooling. I am not absolutely surely.

nirbenz commented 7 years ago

Thanks. That actually explains different outputs for VGG/AlexNet converted models but not for ResNet! I will look into that.

Zehaos commented 7 years ago

@nirbenz Any progress on this issue?

nirbenz commented 7 years ago

Nope. I believe a robust solution would be to either run the converted architecture for a few more iterations (to let it fix the numerical errors from different pooling implementation) or to use architectures that don't use pooling to begin with - such as Faster-R-CNN ResNet. I can confirm that for ResNet converted models, outputs are much smaller than for VGG-16 models.

Of course better understanding of differences between Caffe Pooling and MXNet Pooling/padding can also solve this - and I have done something similar when added BatchNorm support to the MXNet converter - but I didn't get around to it this time, and VGG-16 is an old architecture anyway. :)

Nir

Zehaos commented 7 years ago

@nirbenz Thanks.