about the result in table 3

Xuanmeng-Zhang commented 3 years ago

Hi, Park. I'm confused about the result in Table 3(m), where RMSE is 884.1mm. I think the setting (m) is the total proposed framework. What is the difference between (m) and the release model in validation dataset(771.8)?

zzangjinsun commented 3 years ago

Hi @Xuanmeng-Zhang,

All the ablation results reported in the Tab. 3 in the main paper are trained with 912 x 228 center-cropped patches for 20 epochs with a batch of 12 images. (Please refer to the Sec. 6.3)

The final validation performance with the full-modal trained on full-data is 771.8 which is reported in this git.

Thank you!

Xuanmeng-Zhang commented 3 years ago

Thank you for your answering! I have some question about the code as follows.

            for idx_off in range(0, self.num + 1):
                 ww = idx_off % self.k_f
                 hh = idx_off // self.k_f 
            if ww == (self.k_f - 1) / 2 and hh == (self.k_f - 1) / 2:
                continue

            offset_tmp = offset_each[idx_off].detach()
            offset_tmp[:, 0, :, :] = \
                offset_tmp[:, 0, :, :] + hh - (self.k_f - 1) / 2
            offset_tmp[:, 1, :, :] = \
                offset_tmp[:, 1, :, :] + ww - (self.k_f - 1) / 2

            conf_tmp = ModulatedDeformConvFunction.apply(
                confidence, offset_tmp, modulation_dummy, self.w_conf,
                self.b, self.stride, 0, self.dilation, self.groups,
                self.deformable_groups, self.im2col_step)
            list_conf.append(conf_tmp)`

I wonder why using offset_tmp to compute the conf_tmp. In other words, can we use variable offset to compute conf_tmp like follow code. def _propagate_once(self, feat, offset, aff): feat = ModulatedDeformConvFunction.apply( feat, offset, aff, self.w, self.b, self.stride, self.padding, self.dilation, self.groups, self.deformable_groups, self.im2col_step ) return feat. Thank you!

zzangjinsun commented 3 years ago

The current implementation appends each neighbor's confidence to a list and the final confidence volume for neighbors are constructed by concatenating each neighbor's confidence.

If we directly use ModulatedDeformConvFunction, it sums each neighbor's confidence along the channel and it is difficult to get each neighbor's confidence separately.

I think using group or deformable group will enable more efficient implementation, but at the time of development, I adopted 100% correct implementation although it is slightly inefficient.

zzangjinsun / NLSPN_ECCV20

about the result in table 3 #13