rbgirshick / py-faster-rcnn

Faster R-CNN (Python implementation) -- see https://github.com/ShaoqingRen/faster_rcnn for the official MATLAB version
Other
8.09k stars 4.11k forks source link

two sibling layers in RPN are fully connected or 1x1 conv layers? #952

Open sanhai77 opened 1 year ago

sanhai77 commented 1 year ago

I read the faster r-cnn paper and i confused about region proposal section(3.1). now i dont know sibling layer in the RPN are fully connected layers or 1x1 convolutional layers?

This feature is fed into two sibling fullyconnected layers—a box-regression layer (reg) and a box-classification layer (cls).

This architecture is naturally implemented with an n×n convolutional layer followed by two sibling 1 × 1 convolutional layers (for reg and cls, respectively).

i know project implemented by conv layers. but I cant understand the contradiction in the explanation in the context of the article.

karandeepdps commented 1 year ago

The region proposal network (RPN) described in the Faster R-CNN paper has an architecture that might be initially confusing. However, to clarify your question, the term "fully connected layers" is used here in a way that may be misleading.

In the RPN, there are indeed two sibling layers: a box-regression layer (reg) and a box-classification layer (cls). These two layers are referred to as "fully connected" in the sense that each sliding window in the convolutional feature map is connected to these layers, but it doesn't mean these layers are fully connected in the traditional sense like the ones used in standard neural networks.

When the authors say, "This architecture is naturally implemented with an n×n convolutional layer followed by two sibling 1×1 convolutional layers (for reg and cls, respectively)," they're referring to the fact that these two sibling layers are implemented as 1x1 convolutional layers, which have the same effect as fully connected layers when applied on a region of interest.

To further clarify, 1x1 convolution essentially applies a single filter for each input pixel (or region of interest in the convolutional feature map), transforming the input into a scalar value. This mechanism acts like a fully connected layer that takes all pixels in the region of interest as input. However, unlike traditional fully connected layers, it has the advantage of maintaining the spatial dimensions of the input, which is important for tasks like object localization and detection.

So, in summary, the sibling layers (reg and cls) in the RPN are not fully connected layers in the traditional sense, but rather 1x1 convolutional layers, as clarified by the authors in the context of their Faster R-CNN paper.

sanhai77 commented 1 year ago

Thank you for the quick and complete answer