Closed Cindy0725 closed 1 month ago
Hey Cindy, thanks for your interest. Here's the script I used for creating the surface voxel labels. It's a bit messy right now since I haven't had time to clean it up. I believe it will answer all your questions. I'll release the code as soon as possible, probably in Feb.
def compute_target(points, img_meta, depth_maps, depth_masks, voxel_size, depth_cast_margin):
device = points.device
n_images = len(img_meta['lidar2img']['extrinsic'])
H, W = depth_maps.shape[1], depth_maps.shape[2]
n_x_voxels, n_y_voxels, n_z_voxels = points.shape[-3:]
points = points.view(1, 3, -1).expand(n_images, 3, -1)
# (num_images, 3+1, num_voxels)
points = torch.cat((points, torch.ones_like(points[:, :1])), dim=1)
# (num_images, 3, num_voxels)
points_2d = torch.bmm(compute_projection(img_meta).to(device), points)
# (num_images, num_voxels)
z = points_2d[:, 2]
x = (points_2d[:, 0] / z).round().long()
y = (points_2d[:, 1] / z).round().long()
valid = (x >= 0) & (y >= 0) & (x < W) & (y < H) & (z > 0)
n_voxels = points.shape[-1]
gt_depth = torch.zeros((n_images, n_voxels), device=device)
for i in range(n_images):
valid[i, valid[i]] = valid[i, valid[i]] & depth_masks[i, y[i, valid[i]], x[i, valid[i]]]
gt_depth[i, valid[i]] = depth_maps[i, y[i, valid[i]], x[i, valid[i]]]
extrinsic = torch.tensor(np.stack(img_meta['lidar2img']['extrinsic'])).to(device)
# Shape: (num_images, 3+1, num_voxels)
points_cam = torch.bmm(extrinsic, points)
# Shape: (num_images, num_voxels)
vx_depth = points_cam[:, 2] / points_cam[:, 3]
margin = voxel_size[2] * (depth_cast_margin * 0.5)
for i in range(n_images):
gt_dep = gt_depth[i, valid[i]]
vx_dep = vx_depth[i, valid[i]]
valid[i, valid[i]] = valid[i, valid[i]] & ((gt_dep <= vx_dep + margin) & \
(vx_dep - margin <= gt_dep))
valid = valid.view(n_images, 1, n_x_voxels, n_y_voxels, n_z_voxels)
target_occ = (valid.sum(dim=0) > 0)
return target_occ
Hi @ttaoREtw, thank you very much for your kind reply.
I have some questions about the argument of this function:
Looking forward to your kind reply. Have a nice day!
points
can be generated by this script:
@torch.no_grad()
def get_points(n_voxels, voxel_size, origin):
points = torch.stack(torch.meshgrid([
torch.arange(n_voxels[0]),
torch.arange(n_voxels[1]),
torch.arange(n_voxels[2])
]))
new_origin = origin - n_voxels / 2. * voxel_size
points = points * voxel_size.view(3, 1, 1, 1) + new_origin.view(3, 1, 1, 1)
return points
I hope this answers your questions.
Hi it's great work!
I am very interested in the supervision of the geometry shaping module. If am not wrong, the input of the geometry module is V (HxWxDxC). If will go through several Conv3D and Conv3D(T) layers and output the geometry shaping weight, which has the same size with V. I am wondering the detailed steps of getting the ground truth surface voxels.
Assume we have 20 images per scene, then the "RGB-D frames" here means converting the 20 depth images to 20 sparse point clouds, and if one voxel doesn't contain any point of the 20 point clouds, the ground truth value will be negative or 0, and if one voxel contain points from the 20 point clouds, the ground truth value will be positive or 1?
Another question is about "for each camera ray, we also consider locations neighboring surface voxels within margin as positive". what does "for each camera ray" means here? In this geometry shaping step, the multi-view features are already fused. I am confused about the steps of selecting the neighbors.
The third question is the size of the ground truth surface voxel. Is it a HxWxDxC tensor with values 0 and 1? Then you apply focal loss to it and the predicted weight to supervise the geometry shaping model?
Really looking forward to your kind reply. Thank you very much!