Modify the architecture of DAPPM considering the spatial size of the feature maps from generated by DAPPM

swoook commented 3 years ago

Is your feature request related to a problem? Please describe.

DAPPM performs large pooling kernels and exponential strides to:

enlarge effective receptive fields
fuse multi-scale context

See four different sub-branches which include pooling layer (red rectangle) from Fig. 5.

name	operation	down-sampling rate
`scale1`	`AvgPool2d(kernel_size=5, stride=2, padding=2)`	128
`scale2`	`AvgPool2d(kernel_size=9, stride=4, padding=4)`	256
`scale3`	`AvgPool2d(kernel_size=17, stride=8, padding=8)`	512
`scale4`	`AdaptiveAvgPool2d((1, 1))`	(H, W)

DAPPM down-samples an input image by factors of 128, 256 and 512
However, recall that DDRNet is trained and benchmarked on two datasets:
1. Cityscapes
2. CamVid
The resolutions of their frames are (2048, 1024) and (1024, 1024), respectively
In those cases, scale3 outputs feature-maps with spatial size of (4, 2) and (2, 2), respectively
But assume the input images are <= (512, 512)
Then, scale3 and scale4 outputs feature-maps with spatial size of (1, 1)
It means they have almost same receptive fields
I.e. Redundant maybe?
If the input images are small, wouldn't it be better to eliminate some poolings in DAPPM considering the receptive field?
Decide after seeing if the existing one is trained well

Describe the solution you'd like
If the input images are small, wouldn't it be better to eliminate some poolings in DAPPM considering the receptive field?
Decide after seeing if the existing one is trained well

Describe alternatives you've considered
None

Additional context
DUTS-TR is a dataset for salient object detection
We'd like to train DDRNet on this dataset for real-time salient object detection
Refer to #6 from swoook/ucnet (github) for more details

swoook commented 3 years ago

We decide not to eliminate some poolings in DAPPM
But to modify the expected size of input tensor for DAPPM

fig-04 ^{Fig. 4. The overview of DDRNets on semantic segmentation. “RB” denotes sequential residual basic blocks. “RBB” denotes single residual bottleneck block. “DAPPM” denotes the Deep Aggregation Pyramid Pooling Module. “Seg. Head” denotes the segmentation head. Black solid lines denote information paths with data processing (including upsampling and downsampling) and black dashed lines denote information path without data processing. “sum” denotes pointwise concatenation. Dashed boxes denote the components which are disregarded in the inference stage.}

The layer before DAPPM is called RBB 1/64
It down-samples an input image by factor of 64
Recall how DDRNet is trained on Cityscapes
Images are randomly cropped into (1024, 1024)
In that case, it outputs feature-maps with spatial size of (16, 16)
We'd like to change it to output feature-maps with size of (16, 16) even when the size of the input image is (512, 512)
RBB 1/64 is implemented as self.layer5 in sod.models.ddrnet_23_slim.DualResNet
We can achieve it by change its stride from 2 to 1
Before:

        self.layer5 =  self._make_layer(Bottleneck, planes * 8, planes * 8, 1, stride=2)

After:

        self.layer5 =  self._make_layer(Bottleneck, planes * 8, planes * 8, 1, stride=1)

swoook commented 3 years ago

Implemented a request from 1st comment and trained it (run-14)
It doesn't dramatically improve MAE (0.047 at 24 epochs)
However, confirmed the loss decreased more continuously than before
Then, how about change whole down-sampling rate?
Recall that the DDRNet is trained on Cityscapes with size of (1024, 1024)
And its backbone (not DAPPM) down-samples an input tensors by factors of 4, 8, 16, 32, and 64
Then, the output tensors have sizes of 256, 128, 64, 32, and 16, respectively
We'd like to change it to output feature-maps with similar size for (512, 512)
I.e. Decrease down-sampling rates to 2, 4, 8, 16 and 32
We can achieve it by change the down-sampling rate of RB 1/4 from 4 to 2

swoook commented 3 years ago

☠️ Confirmed the request above drastically increases a processing time on CPU
Specifically, FPS on Threadripper 2950X CPU decreases from 18.6 to 6.9
❗ Do not lower the down-sampling rates of whole backbone

swoook / ddrnet

Modify the architecture of DAPPM considering the spatial size of the feature maps from generated by DAPPM #7