leonzfa / iResNet

Other
125 stars 37 forks source link

displacement range in correlation and EPE computation #4

Open stalin18 opened 6 years ago

stalin18 commented 6 years ago

Hi, in your paper you wrote that "Correlation with a large displacement (i.e., 40) is performed between conv2a and conv2b" and "correlation with a small displacement (i.e., 20) is performed to capture fine-grained but short range correspondence". However, in Table 1 (detailed architecture), output of corr1d has 81 channels, while output of r_corr has 41 channels. Is there a mistake, do you use 80 and 40 displacements instead of 40 and 20?

Lastly, I just wanted to confirm this: the EPE reported in Table 2 (Comparative results on the Scene Flow dataset for networks with different settings) is calculated on entire Scene Flow test dataset without discarding any images / pixels? Only in Table 3 (comparison with CRL), you discard some test images following the same procedure as in CRL paper?

Thank you very much for the code and all the help!

leonzfa commented 6 years ago

@stalin18 Hi,we use two-direction correlation in corr1d and r_corr layer in the paper, so the channel number is 402+1=81, and 202+1=41 respectively.

layer {  name: "corr"  type: "Correlation1D"  bottom: "conv2a"  bottom: "conv2b"  top: "corr"
  correlation_param {    pad: 40    kernel_size: 1    max_displacement: 40    stride_1: 1    stride_2: 1  }
}

In fact, there is no need to use two-direction in the Correlation1d layer, you can set the parameter _singledirection = -1

layer {  name: "corr"  type: "Correlation1D"  bottom: "conv2a"  bottom: "conv2b"  top: "corr"
  correlation_param {    pad: 40    kernel_size: 1    max_displacement: 40    single_direction: -1   stride_1: 1    stride_2: 1  }
}

Yes, in Table 2, we calculated EPE on entire Scene Flow test set (all the 4370 image pairs). CRL did not report their results on the entire test set, thus we remove some images just as CRL did for fair comparison.

long2double commented 6 years ago

Hello, I have a problom.In FlowNet:Learning Optical Flow with Convolutional Networks. The size of the feature map obtained after the input layer passes through the correlation layers is (wxhxD**2), D=2d+1.In your paper, 'Correlation with a large displacement (i.e., 40)' and ' Correlation with a small displacement (i.e., 20) ',D=2x40+1=81 and D=2x20+1=41,result the size of the feature map is wxhx81x81 and wxhx41x41 ,channel is not 81 and 41.If the output channel is 81, D is equal to 9, and d is equal to 4, but when the channel is 41, D and d cannot be counted.

long2double commented 6 years ago

Thank you very much!!