yuxumin / PoinTr

[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers
MIT License
622 stars 111 forks source link

will you release the code for SSC? #91

Open fereenwong opened 1 year ago

fereenwong commented 1 year ago

Thanks for your great work! I noted that you implemented this work to the SSC task in your paper ``AdaPoinTr: Diverse Point Cloud Completion with Adaptive Geometry-Aware Transformers''. Will you release the code for SSC?

yuxumin commented 1 year ago

Hi, thanks for your interest in our work. Yes, we plan to release the code for PoinTr (and other methods like Sketch, DDRNet) on SSC task. However, we are currently focusing on other work and the code releasing will be delayed.

fereenwong commented 1 year ago

Hi, thanks for your interest in our work. Yes, we plan to release the code for PoinTr (and other methods like Sketch, DDRNet) on SSC task. However, we are currently focusing on other work and the code releasing will be delayed.

Thanks for your timely reply! Stay following!

JimPlayboy commented 1 year ago

Thanks for your great work! I met up with a problem of too many query voxels that exceed the GPU memory limit and I am eager for your help!

As is described by Fig.7 in your paper AdaPoinTr, I tried to add your method as a plug-in block to an existing SSC method called VoxFormer. I modified the forward method of PCTransformer and add it to the head of voxformer. the forward method of PCTransformer is as follows:

def forward(self, xyz, query_xyz):
        bs = xyz.size(0)
        coor, f = self.grouper(xyz, self.center_num) # b n c
        pe =  self.pos_embed(coor)
        x = self.input_proj(f)
        x = self.encoder(x + pe, coor) # b n c
        queries = self.query_embed.weight.to(xyz.device)
        query_pe = self.pos_embed(query_xyz)
        queries = self.decoder(q=queries+query_pe, v=x, q_pos=query_xyz, v_pos=coor, denoise_length=0)
        return queries

the forward method of VoxformerHead is as follows:

def forward(self, mlvl_feats, img_metas, target):
        bs, num_cam, _, _, _ = mlvl_feats[0].shape # [1,5,128,24,77] #输入图像特征,batch size 1,5图像,resnet
        dtype = mlvl_feats[0].dtype# float32
        bev_queries = self.bev_embed.weight.to(dtype) #[262144(128*128*16), dim128]

        # Generate bev postional embeddings for cross and self attention
        bev_pos_cross_attn = self.positional_encoding(torch.zeros((bs, self.grid_size_row, self.grid_size_col), device=bev_queries.device).to(dtype)).to(dtype) # [1, dim, 128*4, 128*4]
        bev_pos_self_attn = self.positional_encoding(torch.zeros((bs, self.grid_size_row, self.grid_size_col), device=bev_queries.device).to(dtype)).to(dtype) # [1, dim, 128*4, 128*4]

        # Load query proposals
        proposal =  img_metas[0]['proposal'].reshape(128,128,16)
        step = 128//self.bev_w
        proposal_downscale = np.zeros([self.bev_h,self.bev_w,self.bev_z])
        for i in range(step):
            for j in range(step):
                for k in range(step):
                    proposal_downscale = proposal[i::step,j::step,k::step]
        #proposal =  img_metas[0]['proposal'].reshape(self.bev_h, self.bev_w, self.bev_z)
        unmasked_idx = np.asarray(np.where(proposal_downscale.reshape(-1)>0)).astype(np.int32) # (128,128,16)--reshape(-1)-->(262144,)--'>0'-->(262144,)--where-->tuple[(44315)]--asarray-->(1,44315)
        masked_idx = np.asarray(np.where(proposal_downscale.reshape(-1)==0)).astype(np.int32)
        vox_coords, ref_3d = self.get_ref_3d() #vox_coords体素坐标及索引【n_voxels,4】(x,y,z,idx),0-127 # ref_3d 单位化的体素中心坐标 0-127 (0.5-127.5)/128

        unmasked_ref_3d = torch.from_numpy(ref_3d[vox_coords[unmasked_idx[0], 3], :])  # [44315,3]
        unmasked_ref_3d = unmasked_ref_3d.unsqueeze(0).to(bev_queries.device)  # [1,44315,3]

        geo_feats = self.pointr_transformer(unmasked_ref_3d.float(),torch.from_numpy(ref_3d).float().unsqueeze(0).to(bev_queries.device))

        # Compute seed features of query proposals by deformable cross attention
        seed_feats = self.cross_transformer.get_vox_features( # [1,44315,64]
            mlvl_feats, 
            bev_queries,
            self.bev_h,
            self.bev_w,
            ref_3d=ref_3d,
            vox_coords=vox_coords,
            unmasked_idx=unmasked_idx,
            grid_length=(self.real_h / self.bev_h, self.real_w / self.bev_w),
            bev_pos=bev_pos_cross_attn,
            img_metas=img_metas,
            prev_bev=None,
        )

        # Complete voxel features by adding mask tokens
        vox_feats = torch.empty((self.bev_h, self.bev_w, self.bev_z, self.embed_dims), device=bev_queries.device)#[128,128,16,64]
        vox_feats_flatten = vox_feats.reshape(-1, self.embed_dims)#[128,128,16,64] -> [262144,64]
        vox_feats_flatten[vox_coords[unmasked_idx[0], 3], :] = seed_feats[0]
        vox_feats_flatten[vox_coords[masked_idx[0], 3], :] = self.mask_embed.weight.view(1, self.embed_dims).expand(masked_idx.shape[1], self.embed_dims).to(dtype)

        # Diffuse voxel features by deformable self attention
        vox_feats_diff = self.self_transformer.diffuse_vox_features(
            mlvl_feats,
            vox_feats_flatten,
            512,
            512,
            ref_3d=ref_3d,
            vox_coords=vox_coords,
            unmasked_idx=unmasked_idx,
            grid_length=(self.real_h / self.bev_h, self.real_w / self.bev_w),
            bev_pos=bev_pos_self_attn,# [1, dim, 128*4, 128*4]
            img_metas=img_metas,
            prev_bev=None,
        )
        vox_feats_diff = vox_feats_diff.reshape(self.bev_h, self.bev_w, self.bev_z, self.embed_dims)
        geo_feats = geo_feats.reshape(self.bev_h, self.bev_w, self.bev_z, -1)
        x3d = torch.cat([geo_feats,vox_feats_diff],dim=-1)**
        input_dict = {
            "x3d": x3d.permute(3, 0, 1, 2).unsqueeze(0),#[h,w,z,dim]->[1,dim,h,w,z]
        }
        out = self.header(input_dict)# [1,20,256,256,32]
        return out 

But I met up with the problem of too large memory requirement. In my case, the scene is voxelized into 3D volumes of [128, 128, 16]=262144, the number of which is almost two times as many as the number of voxels in your [60, 36, 60] case. the pre-defined query is [1, 262144, 384]. the coordinate is [1, 262144, 3]. the program tried to allocate 256G and led to "RuntimeError: CUDA out of memory". If i ignore 'denoise_length' in the decoder, the error occurs at line 387 of AdaPoinTr.py, which tries to find out the k nearest neighbours within the query points:

File "/home/qx/ZGY/VoxFormer/projects/mmdet3d_plugin/voxformer/dense_heads/geo_voxformer_head.py", line 111, in forward
    geo_feats = self.pointr_transformer(unmasked_ref_3d.float(),torch.from_numpy(ref_3d).float().unsqueeze(0).to(bev_queries.device))
  File "/home/qx/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/qx/ZGY/VoxFormer/projects/mmdet3d_plugin/voxformer/dense_heads/AdaPoinTr.py", line 883, in forward
    queries = self.decoder(q=queries+query_pe, v=x, q_pos=query_xyz, v_pos=coor)
  File "/home/qx/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/qx/ZGY/VoxFormer/projects/mmdet3d_plugin/voxformer/dense_heads/AdaPoinTr.py", line 527, in forward
    q = self.blocks(q, v, q_pos, v_pos, denoise_length=denoise_length)
  File "/home/qx/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/qx/ZGY/VoxFormer/projects/mmdet3d_plugin/voxformer/dense_heads/AdaPoinTr.py", line 387, in forward
    self_attn_idx = knn_point(self.k, q_pos, q_pos)
  File "/home/qx/ZGY/VoxFormer/projects/mmdet3d_plugin/voxformer/utils/Transformer_utils.py", line 26, in knn_point
    sqrdists = square_distance(new_xyz, xyz)
  File "/home/qx/ZGY/VoxFormer/projects/mmdet3d_plugin/voxformer/utils/Transformer_utils.py", line 46, in square_distance
    dist = -2 * torch.matmul(src, dst.permute(0, 2, 1))
RuntimeError: CUDA out of memory. Tried to allocate 256.00 GiB (GPU 0; 31.75 GiB total capacity; 8.24 GiB already allocated; 17.48 GiB free; 12.89 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

If i set 'denoise_length' to 0, the error occurs at line 249, which tries to define a mask with shape [262144, 262144]. As the number of voxel queries in your case is half that of our case, have you ever met with similar problems? Could you please help me check the problem? thanks a lot!