Open fereenwong opened 1 year ago
Hi, thanks for your interest in our work. Yes, we plan to release the code for PoinTr (and other methods like Sketch, DDRNet) on SSC task. However, we are currently focusing on other work and the code releasing will be delayed.
Hi, thanks for your interest in our work. Yes, we plan to release the code for PoinTr (and other methods like Sketch, DDRNet) on SSC task. However, we are currently focusing on other work and the code releasing will be delayed.
Thanks for your timely reply! Stay following!
Thanks for your great work! I met up with a problem of too many query voxels that exceed the GPU memory limit and I am eager for your help!
As is described by Fig.7 in your paper AdaPoinTr, I tried to add your method as a plug-in block to an existing SSC method called VoxFormer. I modified the forward method of PCTransformer and add it to the head of voxformer. the forward method of PCTransformer is as follows:
def forward(self, xyz, query_xyz):
bs = xyz.size(0)
coor, f = self.grouper(xyz, self.center_num) # b n c
pe = self.pos_embed(coor)
x = self.input_proj(f)
x = self.encoder(x + pe, coor) # b n c
queries = self.query_embed.weight.to(xyz.device)
query_pe = self.pos_embed(query_xyz)
queries = self.decoder(q=queries+query_pe, v=x, q_pos=query_xyz, v_pos=coor, denoise_length=0)
return queries
the forward method of VoxformerHead is as follows:
def forward(self, mlvl_feats, img_metas, target):
bs, num_cam, _, _, _ = mlvl_feats[0].shape # [1,5,128,24,77] #输入图像特征,batch size 1,5图像,resnet
dtype = mlvl_feats[0].dtype# float32
bev_queries = self.bev_embed.weight.to(dtype) #[262144(128*128*16), dim128]
# Generate bev postional embeddings for cross and self attention
bev_pos_cross_attn = self.positional_encoding(torch.zeros((bs, self.grid_size_row, self.grid_size_col), device=bev_queries.device).to(dtype)).to(dtype) # [1, dim, 128*4, 128*4]
bev_pos_self_attn = self.positional_encoding(torch.zeros((bs, self.grid_size_row, self.grid_size_col), device=bev_queries.device).to(dtype)).to(dtype) # [1, dim, 128*4, 128*4]
# Load query proposals
proposal = img_metas[0]['proposal'].reshape(128,128,16)
step = 128//self.bev_w
proposal_downscale = np.zeros([self.bev_h,self.bev_w,self.bev_z])
for i in range(step):
for j in range(step):
for k in range(step):
proposal_downscale = proposal[i::step,j::step,k::step]
#proposal = img_metas[0]['proposal'].reshape(self.bev_h, self.bev_w, self.bev_z)
unmasked_idx = np.asarray(np.where(proposal_downscale.reshape(-1)>0)).astype(np.int32) # (128,128,16)--reshape(-1)-->(262144,)--'>0'-->(262144,)--where-->tuple[(44315)]--asarray-->(1,44315)
masked_idx = np.asarray(np.where(proposal_downscale.reshape(-1)==0)).astype(np.int32)
vox_coords, ref_3d = self.get_ref_3d() #vox_coords体素坐标及索引【n_voxels,4】(x,y,z,idx),0-127 # ref_3d 单位化的体素中心坐标 0-127 (0.5-127.5)/128
unmasked_ref_3d = torch.from_numpy(ref_3d[vox_coords[unmasked_idx[0], 3], :]) # [44315,3]
unmasked_ref_3d = unmasked_ref_3d.unsqueeze(0).to(bev_queries.device) # [1,44315,3]
geo_feats = self.pointr_transformer(unmasked_ref_3d.float(),torch.from_numpy(ref_3d).float().unsqueeze(0).to(bev_queries.device))
# Compute seed features of query proposals by deformable cross attention
seed_feats = self.cross_transformer.get_vox_features( # [1,44315,64]
mlvl_feats,
bev_queries,
self.bev_h,
self.bev_w,
ref_3d=ref_3d,
vox_coords=vox_coords,
unmasked_idx=unmasked_idx,
grid_length=(self.real_h / self.bev_h, self.real_w / self.bev_w),
bev_pos=bev_pos_cross_attn,
img_metas=img_metas,
prev_bev=None,
)
# Complete voxel features by adding mask tokens
vox_feats = torch.empty((self.bev_h, self.bev_w, self.bev_z, self.embed_dims), device=bev_queries.device)#[128,128,16,64]
vox_feats_flatten = vox_feats.reshape(-1, self.embed_dims)#[128,128,16,64] -> [262144,64]
vox_feats_flatten[vox_coords[unmasked_idx[0], 3], :] = seed_feats[0]
vox_feats_flatten[vox_coords[masked_idx[0], 3], :] = self.mask_embed.weight.view(1, self.embed_dims).expand(masked_idx.shape[1], self.embed_dims).to(dtype)
# Diffuse voxel features by deformable self attention
vox_feats_diff = self.self_transformer.diffuse_vox_features(
mlvl_feats,
vox_feats_flatten,
512,
512,
ref_3d=ref_3d,
vox_coords=vox_coords,
unmasked_idx=unmasked_idx,
grid_length=(self.real_h / self.bev_h, self.real_w / self.bev_w),
bev_pos=bev_pos_self_attn,# [1, dim, 128*4, 128*4]
img_metas=img_metas,
prev_bev=None,
)
vox_feats_diff = vox_feats_diff.reshape(self.bev_h, self.bev_w, self.bev_z, self.embed_dims)
geo_feats = geo_feats.reshape(self.bev_h, self.bev_w, self.bev_z, -1)
x3d = torch.cat([geo_feats,vox_feats_diff],dim=-1)**
input_dict = {
"x3d": x3d.permute(3, 0, 1, 2).unsqueeze(0),#[h,w,z,dim]->[1,dim,h,w,z]
}
out = self.header(input_dict)# [1,20,256,256,32]
return out
But I met up with the problem of too large memory requirement. In my case, the scene is voxelized into 3D volumes of [128, 128, 16]=262144, the number of which is almost two times as many as the number of voxels in your [60, 36, 60] case. the pre-defined query is [1, 262144, 384]. the coordinate is [1, 262144, 3]. the program tried to allocate 256G and led to "RuntimeError: CUDA out of memory". If i ignore 'denoise_length' in the decoder, the error occurs at line 387 of AdaPoinTr.py, which tries to find out the k nearest neighbours within the query points:
File "/home/qx/ZGY/VoxFormer/projects/mmdet3d_plugin/voxformer/dense_heads/geo_voxformer_head.py", line 111, in forward
geo_feats = self.pointr_transformer(unmasked_ref_3d.float(),torch.from_numpy(ref_3d).float().unsqueeze(0).to(bev_queries.device))
File "/home/qx/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/qx/ZGY/VoxFormer/projects/mmdet3d_plugin/voxformer/dense_heads/AdaPoinTr.py", line 883, in forward
queries = self.decoder(q=queries+query_pe, v=x, q_pos=query_xyz, v_pos=coor)
File "/home/qx/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/qx/ZGY/VoxFormer/projects/mmdet3d_plugin/voxformer/dense_heads/AdaPoinTr.py", line 527, in forward
q = self.blocks(q, v, q_pos, v_pos, denoise_length=denoise_length)
File "/home/qx/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/qx/ZGY/VoxFormer/projects/mmdet3d_plugin/voxformer/dense_heads/AdaPoinTr.py", line 387, in forward
self_attn_idx = knn_point(self.k, q_pos, q_pos)
File "/home/qx/ZGY/VoxFormer/projects/mmdet3d_plugin/voxformer/utils/Transformer_utils.py", line 26, in knn_point
sqrdists = square_distance(new_xyz, xyz)
File "/home/qx/ZGY/VoxFormer/projects/mmdet3d_plugin/voxformer/utils/Transformer_utils.py", line 46, in square_distance
dist = -2 * torch.matmul(src, dst.permute(0, 2, 1))
RuntimeError: CUDA out of memory. Tried to allocate 256.00 GiB (GPU 0; 31.75 GiB total capacity; 8.24 GiB already allocated; 17.48 GiB free; 12.89 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
If i set 'denoise_length' to 0, the error occurs at line 249, which tries to define a mask with shape [262144, 262144]. As the number of voxel queries in your case is half that of our case, have you ever met with similar problems? Could you please help me check the problem? thanks a lot!
Thanks for your great work! I noted that you implemented this work to the SSC task in your paper ``AdaPoinTr: Diverse Point Cloud Completion with Adaptive Geometry-Aware Transformers''. Will you release the code for SSC?