pmj110119 / RenderOcc

[ICRA 2024] RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision. (Former version: UniOcc)
422 stars 23 forks source link

How to use `camera_mask` during training? #37

Closed getterupper closed 2 weeks ago

getterupper commented 6 months ago

Hi, when attempting to train RenderOcc using camera_mask, my results can only reach a maximum mIoU of $30.53$, instead of the $40-50$ reported by UniOcc. Could you please share how you used camera_mask for training? Currently my approach is:

    def loss_3d(self, voxel_semantics, camera_mask, density_prob, semantic):
        voxel_semantics=voxel_semantics.long()

        voxel_semantics=voxel_semantics.reshape(-1)
        density_prob=density_prob.reshape(-1, 2)
        semantic = semantic.reshape(-1, self.num_classes-1)
        density_target = (voxel_semantics==17).long()
        semantic_mask = voxel_semantics!=17

        camera_mask = camera_mask.reshape(-1)

        density_prob = density_prob[camera_mask]
        density_target = density_target[camera_mask]

        valid_mask = torch.logical_and(semantic_mask, camera_mask)
        voxel_semantics = voxel_semantics[valid_mask]
        semantic = semantic[valid_mask]

        # compute loss
        loss_geo=self.loss_occ(density_prob, density_target)
        loss_sem = self.semantic_loss(semantic, voxel_semantics.long())

        loss_ = dict()
        loss_['loss_3d_geo'] = loss_geo
        loss_['loss_3d_sem'] = loss_sem
        return loss_