mli0603 / stereo-transformer

Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers. (ICCV 2021 Oral)
Apache License 2.0
659 stars 107 forks source link

About rr loss #86

Closed ZhileiChen99 closed 1 year ago

ZhileiChen99 commented 1 year ago

Hi, thanks for your wonderful work. I have some problems about the rr loss. As is explained in th main paper, the rr loss is divided into two parts: matched pixels and unmatched pixels. However, I find the loss of mathed pixels also includes unmatched pixels:

def _compute_gt_location(self, scale: int, sampled_cols: Tensor, sampled_rows: Tensor,
                             attn_weight: Tensor, disp: Tensor):

        Find target locations using ground truth disparity.
        Find ground truth response at those locations using attention weight.
        :param scale: high-res to low-res disparity scale
        :param sampled_cols: index to downsample columns
        :param sampled_rows: index to downsample rows
        :param attn_weight: attention weight (output from _optimal_transport), [N,H,W,W]
        :param disp: ground truth disparity
        :return: response at ground truth location [N,H,W,1] and target ground truth locations [N,H,W,1]

        # compute target location at full res
        _, _, w = disp.size()
        pos_l = torch.linspace(0, w - 1, w)[None,].to(disp.device)  # 1 x 1 x W (left)
        target = (pos_l - disp)[..., None]  # N x H x W (left) x 1

        if sampled_cols is not None:
            target = batched_index_select(target, 2, sampled_cols)
        if sampled_rows is not None:
            target = batched_index_select(target, 1, sampled_rows)
        target = target / scale  # scale target location

        # compute ground truth response location for rr loss
        gt_response = torch_1d_sample(attn_weight, target, 'linear')  # NxHxW_left

        return gt_response, target

The gt_responese does not exclude unmathed pixels by occ_mask. Is this a bug? I think this may influence the predition of occlusion pixels.

mli0603 commented 1 year ago

Hi @ZhileiChen99

Thank you for your interest in the project. Even though the computation of gt_response as you referred to doesn't consider occlusion, the loss computation process (see below) does disregard the occluded region. I hope this helps!

https://github.com/mli0603/stereo-transformer/blob/d0aa1ad9c84f3dab15a2f2a9ead2ca6cf9fe8971/module/loss.py#L88-L125