Closed swoook closed 3 years ago
Fig. 4. The overview of DDRNets on semantic segmentation. “RB” denotes sequential residual basic blocks. “RBB” denotes single residual bottleneck block. “DAPPM” denotes the Deep Aggregation Pyramid Pooling Module. “Seg. Head” denotes the segmentation head. Black solid lines denote information paths with data processing (including upsampling and downsampling) and black dashed lines denote information path without data processing. “sum” denotes pointwise concatenation. Dashed boxes denote the components which are disregarded in the inference stage.
self.layer5
in sod.models.ddrnet_23_slim.DualResNet
self.layer5 = self._make_layer(Bottleneck, planes * 8, planes * 8, 1, stride=2)
self.layer5 = self._make_layer(Bottleneck, planes * 8, planes * 8, 1, stride=1)
Is your feature request related to a problem? Please describe.
See four different sub-branches which include pooling layer (red rectangle) from Fig. 5.
scale1
AvgPool2d(kernel_size=5, stride=2, padding=2)
scale2
AvgPool2d(kernel_size=9, stride=4, padding=4)
scale3
AvgPool2d(kernel_size=17, stride=8, padding=8)
scale4
AdaptiveAvgPool2d((1, 1))
DAPPM down-samples an input image by factors of 128, 256 and 512
However, recall that DDRNet is trained and benchmarked on two datasets:
The resolutions of their frames are (2048, 1024) and (1024, 1024), respectively
In those cases,
scale3
outputs feature-maps with spatial size of (4, 2) and (2, 2), respectivelyBut assume the input images are <= (512, 512)
Then,
scale3
andscale4
outputs feature-maps with spatial size of (1, 1)It means they have almost same receptive fields
I.e. Redundant maybe?
If the input images are small, wouldn't it be better to eliminate some poolings in DAPPM considering the receptive field?
Decide after seeing if the existing one is trained well
Describe the solution you'd like
If the input images are small, wouldn't it be better to eliminate some poolings in DAPPM considering the receptive field?
Decide after seeing if the existing one is trained well
Describe alternatives you've considered
None
Additional context
DUTS-TR is a dataset for salient object detection
We'd like to train DDRNet on this dataset for real-time salient object detection
Refer to #6 from swoook/ucnet (github) for more details