visinf / irr

Iterative Residual Refinement for Joint Optical Flow and Occlusion Estimation (CVPR 2019)
Apache License 2.0
192 stars 33 forks source link

Question about occlusion estimation #52

Open JamesYang110043 opened 1 month ago

JamesYang110043 commented 1 month ago

Hi @hurjunhwa

First, I'd like to thank you for your excellent work on this project. It has been incredibly valuable for my research.

I have a question regarding the ablation study in Table 1 of your paper. Specifically, I'm curious about the "Occ" module. Is the module in Table 1 the same as the one highlighted in red in Figure 5? If so, could you explain why incorporating the "Occ" module improves the performance of optical flow estimation?

Thank you for your time and assistance.

Screenshot 2024-07-27 at 1 32 45 AM Screenshot 2024-07-27 at 1 33 50 AM

hurjunhwa commented 1 month ago

Hi James, Thanks for your interest in our work. Yes, the Occ module in Table 1 corresponds to the oclcusion decoder in Fig. 5.

Optical flow and occlusion estimation are complementary tasks. The intermediate estimation of occlusion at the previous pyramid level is input to the flow decoder at the next level and provides a useful cue for flow estimation.

Also, gradients that are backpropagated from the occlusion decoder affect the feature encoder and can make features more discriminative for flow estimation. Hope this helped!

JamesYang110043 commented 1 month ago

Thank you for your explanation. However, I'm having trouble understanding the part where you mentioned, "The intermediate estimation of occlusion at the previous pyramid level is input to the flow decoder at the next level and provides a useful cue for flow estimation."

From my understanding of the code, the optical flow and occlusion are calculated separately at each pyramid level, and I didn't see the occlusion from the previous level being used as input for the optical flow in the next level. Could you please clarify this part?

Or do you mean that although the occlusion estimation is not explicitly passed directly to the flow decoder, the predictions at each level (including optical flow and occlusion) are part of the input features for the next level? This provides an indirect cue that helps improve the accuracy of the estimation.

https://github.com/visinf/irr/blob/dacd07b1dc963fb8d3db7c75b562691af33f47b2/models/flownet1s_irr_occ.py#L80C9-L124C62

    # Flow Decoder
    predict_flow6        = self._predict_flow6(conv6_1)

    upsampled_flow6_to_5 = self._upsample_flow6_to_5(predict_flow6)
    deconv5              = self._deconv5(conv6_1)
    concat5              = concatenate_as((conv5_1, deconv5, upsampled_flow6_to_5), conv5_1, dim=1)
    predict_flow5        = self._predict_flow5(concat5)

    upsampled_flow5_to_4 = self._upsample_flow5_to_4(predict_flow5)
    deconv4              = self._deconv4(concat5)
    concat4              = concatenate_as((conv4_1, deconv4, upsampled_flow5_to_4), conv4_1, dim=1)
    predict_flow4        = self._predict_flow4(concat4)

    upsampled_flow4_to_3 = self._upsample_flow4_to_3(predict_flow4)
    deconv3              = self._deconv3(concat4)
    concat3              = concatenate_as((conv3_1, deconv3, upsampled_flow4_to_3), conv3_1, dim=1)
    predict_flow3        = self._predict_flow3(concat3)

    upsampled_flow3_to_2 = self._upsample_flow3_to_2(predict_flow3)
    deconv2              = self._deconv2(concat3)
    concat2              = concatenate_as((conv2_im1, deconv2, upsampled_flow3_to_2), conv2_im1, dim=1)
    predict_flow2        = self._predict_flow2(concat2)

    # Occ Decoder
    predict_occ6 = self._predict_occ6(conv6_1)

    upsampled_occ6_to_5 = self._upsample_occ6_to_5(predict_occ6)
    deconv_occ5         = self._deconv_occ5(conv6_1)
    concat_occ5         = concatenate_as((conv5_1, deconv_occ5, upsampled_occ6_to_5), conv5_1, dim=1)
    predict_occ5        = self._predict_occ5(concat_occ5)

    upsampled_occ5_to_4 = self._upsample_occ5_to_4(predict_occ5)
    deconv_occ4         = self._deconv_occ4(concat_occ5)
    concat_occ4         = concatenate_as((conv4_1, deconv_occ4, upsampled_occ5_to_4), conv4_1, dim=1)
    predict_occ4        = self._predict_occ4(concat_occ4)

    upsampled_occ4_to_3 = self._upsample_occ4_to_3(predict_occ4)
    deconv_occ3         = self._deconv_occ3(concat_occ4)
    concat_occ3         = concatenate_as((conv3_1, deconv_occ3, upsampled_occ4_to_3), conv3_1, dim=1)
    predict_occ3        = self._predict_occ3(concat_occ3)

    upsampled_occ3_to_2 = self._upsample_occ3_to_2(predict_occ3)
    deconv_occ2         = self._deconv_occ2(concat_occ3)
    concat_occ2         = concatenate_as((conv2_im1, deconv_occ2, upsampled_occ3_to_2), conv2_im1, dim=1)
    predict_occ2        = self._predict_occ2(concat_occ2)
JamesYang110043 commented 3 weeks ago

Hi , @hurjunhwa Could you please explain this part? It would be very helpful to me, thank you.

hurjunhwa commented 2 weeks ago

Hi @JamesYang110043, sorry for the late reply! Yes, you are totally right. The two decoders are completely separated. At the Upsampling Layer at the end of the network in Fig. 5, estimated flow is a part of the input. That's where flow explicitly helps to refine the occlusion.

In general, gradients that are backpropagated from the occlusion decoder update the feature encoder as well, so that's where the occlusion decoder indirectly helps the flow estimation. I think feature visualization might give a better explanation.

Thanks! Junhwa

JamesYang110043 commented 2 days ago

Thank you for your reply, I understand it now. This has been very helpful to me.