Code of load_pts_as_img on Nuscene

kazuto1011 / dusty-gan-v2

Generative Range Imaging for Learning Scene Priors of 3D LiDAR Data (WACV 2023)

https://kazuto1011.github.io/dusty-gan-v2

MIT License

18 stars 1 forks source link

Code of load_pts_as_img on Nuscene #3

Closed Huang-yihao closed 11 months ago

Huang-yihao commented 11 months ago

Hello, I revised the function with your suggestion and generated some range view maps. But I do not know whether they are correct or not. The load_pts_as_img is

def load_pts_as_img(point_path, scan_unfolding=True, H=64, W=2048,min_depth=1.45,max_depth =80.0):
        # load xyz & intensity and add depth & mask

        points = np.fromfile(point_path, dtype=np.float32).reshape((-1, 5))
        grid_h = points[:,4].astype(np.int32)
        grid_h = np.expand_dims(grid_h, axis=1)
        points = points[:,:4]
        xyz = points[:, :3]  # xyz
        x = xyz[:, [0]]
        y = xyz[:, [1]]
        z = xyz[:, [2]]
        depth = np.linalg.norm(xyz, ord=2, axis=1, keepdims=True)
        # mask = (depth > 0).astype(np.float32)
        mask = (depth >= min_depth) & (depth <= max_depth)
        points = np.concatenate([points, depth, mask], axis=1)

        if scan_unfolding:
            # the i-th quadrant
            # suppose the points are ordered counterclockwise
            quads = np.zeros_like(x, dtype=np.int32)
            quads[(x >= 0) & (y >= 0)] = 0  # 1st
            quads[(x < 0) & (y >= 0)] = 1  # 2nd
            quads[(x < 0) & (y < 0)] = 2  # 3rd
            quads[(x >= 0) & (y < 0)] = 3  # 4th

            # split between the 3rd and 1st quadrants
            diff = np.roll(quads, shift=1, axis=0) - quads
            delim_inds, _ = np.where(diff == 3)  # number of lines
            inds = list(delim_inds) + [len(points)]  # add the last index

            # vertical grid
            # grid_h = np.zeros_like(x, dtype=np.int32)
            # cur_ring_idx = H - 1  # ...0
            # for i in reversed(range(len(delim_inds))):
            #     grid_h[inds[i] : inds[i + 1]] = cur_ring_idx
            #     if cur_ring_idx >= 0:
            #         cur_ring_idx -= 1
            #     else:
            #         break
        else:
            fup, fdown = np.deg2rad(3), np.deg2rad(-25)
            pitch = np.arcsin(z / depth) + abs(fdown)
            grid_h = 1 - pitch / (fup - fdown)
            grid_h = np.floor(grid_h * H).clip(0, H - 1).astype(np.int32)

        # horizontal grid
        yaw = -np.arctan2(y, x)  # [-pi,pi]
        grid_w = (yaw / np.pi + 1) / 2 % 1  # [0,1]
        grid_w = np.floor(grid_w * W).clip(0, W - 1).astype(np.int32)

        grid = np.concatenate((grid_h, grid_w), axis=1)

        # projection
        order = np.argsort(-depth.squeeze(1))
        proj_points = np.zeros((H, W, 4 + 2), dtype=points.dtype)
        proj_points = scatter(proj_points, grid[order], points[order])

        return proj_points

And the main code is

# Load up the nuScenes mini split.
nusc = NuScenes(version='v1.0-mini', dataroot='/mnt/share/yihao/All_dataset/test', verbose=True)
sd_record = nusc.get('sample', 'ca9a282c9e77460f8360f564131a8af5')
ref_sd_token = sd_record['data']['LIDAR_TOP']
ref_sd_record = nusc.get('sample_data', ref_sd_token)
target_H,target_W=64,512
dmin, dmax = 1.45, 80.0

# Load pointcloud.
pcl_path = osp.join(nusc.dataroot, ref_sd_record['filename'])
range_image = load_pts_as_img(pcl_path, scan_unfolding=True, H=target_H, W=target_W,min_depth=dmin,max_depth=dmax)

xyzrdm = TF.to_tensor(range_image)  # (7,H,W)
xyzrdm = TF.resize(xyzrdm, (target_H,target_W), InterpolationMode.NEAREST)
xyzrdm *= xyzrdm[[5]]

xyz = xyzrdm[:3].float()
reflectance=xyzrdm[[3]].float()
depth=xyzrdm[[4]].float()
mask=xyzrdm[[5]].float()

# reflectance = TF.normalize(reflectance, 0.24130844, 0.16860831)
# depth = TF.normalize(depth, 9.689281, 10.08752)
# xyz = TF.normalize(xyz, [-0.01506443, 0.45959818, -0.89225304], [11.224804, 8.237693, 0.88183135])

depth = 1 / depth.add(1e-11) * mask
grid = torchvision.utils.make_grid(depth, nrow=2, pad_value=float("nan"))[0].cpu()
fig = plt.figure(constrained_layout=True)
plt.imshow(grid, cmap="turbo", vmin=1 / dmax, vmax=1 / dmin, interpolation="none")
plt.axis("off")
plt.show()

The generated grid map is as follows.

The xyz is as follows.

The reflectance map is as follows.

I think the maps seem somewhat correct. But there are two questions (1) Is the height and width correct? The range map seems exist some region without information. (2) Do I need to normalize the XYZ, depth and reflectance?

Thank you very much for the help!

kazuto1011 commented 11 months ago

As I mentioned in #2, nuScenes was built w/ Velodyne HDL-32E (namely 32 beams).

import numba
import numpy as np

@numba.jit(nopython=True, parallel=False)
def scatter(array, index, value):
    for (h, w), v in zip(index, value):
        array[h, w] = v
    return array

def load_points_as_images(
    point_path: str,
    scan_unfolding: bool = True,
    H: int = 32,
    W: int = 1024,
    min_depth: float = 0.0,
    max_depth: float = 100.0,
):
    points = np.fromfile(point_path, dtype=np.float32).reshape((-1, 5))
    ring_id = points[:, [4]]
    xyz = points[:, :3]
    x = xyz[:, [0]]
    y = xyz[:, [1]]
    z = xyz[:, [2]]
    intensity = points[:, [3]]
    depth = np.linalg.norm(xyz, ord=2, axis=1, keepdims=True)
    mask = (depth >= min_depth) & (depth <= max_depth)
    points = np.concatenate([xyz, intensity, depth, mask], axis=1)

    if scan_unfolding:
        grid_h = 31 - ring_id.astype(np.int32)  # upside down
    else:
        fup, fdown = np.deg2rad(10.67), np.deg2rad(-30.67)
        elevation = np.arcsin(z / depth) + abs(fdown)
        grid_h = 1 - elevation / (fup - fdown)
        grid_h = np.floor(grid_h * H).clip(0, H - 1).astype(np.int32)

    azimuth = -np.arctan2(y, x)  # [-pi,pi]
    grid_w = (azimuth / np.pi + 1) / 2 % 1  # [0,1]
    grid_w = np.floor(grid_w * W).clip(0, W - 1).astype(np.int32)

    order = np.argsort(-depth.squeeze(1))
    grid = np.concatenate((grid_h, grid_w), axis=1)
    proj_points = np.zeros((H, W, 4 + 2), dtype=points.dtype)
    proj_points = scatter(proj_points, grid[order], points[order])

    return proj_points.astype(np.float32)

import matplotlib.pyplot as plt

point_path = "v1.0-mini/samples/LIDAR_TOP/n008-2018-08-01-15-16-36-0400__LIDAR_TOP__1533151603547590.pcd.bin"
kwargs = dict(scan_unfolding=True, W=512)
image = load_points_as_images(point_path, **kwargs)

fig, ax = plt.subplots(image.shape[-1], 1, figsize=(10, 5), constrained_layout=True)
fig.suptitle(kwargs)
for i in range(image.shape[-1]):
    ax[i].imshow(image[..., i], cmap="turbo", interpolation="none")
    ax[i].axis("off")
    ax[i].set_title("XYZIDM"[i])
plt.axis("off")
plt.show()

output copy

Huang-yihao commented 11 months ago

Thank you very much for your quick reply！ Could you please not close the issue? For I can report the problem of trying depth_to_point_map function

Huang-yihao commented 11 months ago

I have revised the depth_to_point map function as your suggestion.

# Velodyne HDL-32E
H, W = 32, 512
h_up, h_down = 10, -30
w_left, w_right = 180, -180
device = "cpu"

elevation = 1 - torch.arange(H, device=device) / H  # [0, 1]
elevation = elevation * (h_up - h_down) + h_down  # [-30, 10]
azimuth = 1 - torch.arange(W, device=device) / W  # [0, 1]
azimuth = azimuth * (w_left - w_right) + w_right  # [-180, 180]
[elevation, azimuth] = torch.meshgrid([elevation, azimuth], indexing="ij")
angle = torch.stack([elevation, azimuth])[None].deg2rad()

def depth_to_point_map(depth,angle):
    assert depth.dim() == 4
    grid_cos = torch.cos(angle)
    grid_sin = torch.sin(angle)
    grid_x = depth * grid_cos[:, [0]] * grid_cos[:, [1]]
    grid_y = depth * grid_cos[:, [0]] * grid_sin[:, [1]]
    grid_z = depth * grid_sin[:, [0]]
    return torch.cat((grid_x, grid_y, grid_z), dim=1)

If I use the depth map generated by the previous load_pts_as_img function, and put the depth map into depth_to_point map function to obtain the points. I expect the points generated by depth_to_point map function to be the same or close to the points data that put into the load_pts_as_img function (i.e., load from bin file). But I find the shape is not consistent and the points generated by depth_to_point map function cannot be put into the load_pts_as_img function again. Could you please point out the error I made? Thank you very much!

range_image = load_points_as_images(points, scan_unfolding=True, H=target_H, W=target_W,min_depth=dmin,max_depth=dmax)
depth_for_next = TF.to_tensor(range_image[...,3]).unsqueeze(0)
output_points = depth_to_point_map(depth_for_next,angle)
points_for_next = output_points.flatten(2).squeeze().numpy().transpose(1,0)
ring_id_for_next = np.expand_dims(range_image[:, :, [4]].flatten(), axis=1)
intensity_for_next = np.expand_dims(range_image[:, :, [3]].flatten(), axis=1)
data_for_next = np.concatenate((points_for_next, ring_id_for_next,intensity_for_next), axis=1)
range_image_new = load_points_as_images(data_for_next, scan_unfolding=True, H=target_H, W=target_W,min_depth=dmin,max_depth=dmax)

kazuto1011 commented 11 months ago

My above example uses the vertical range [-30.67, 10.67], not [-30, 10].
The input of load_points_as_images has (x, y, z, intensity, ring_id)-order channels, not (x, y, z, ring_id, intensity).

Huang-yihao commented 11 months ago

Thank you for the response. I have adjusted the code, but find the range image strange.

H, W = 32, 512
h_up, h_down = 10.67, -30.67
w_left, w_right = 180, -180
device = "cpu"

elevation = 1 - torch.arange(H, device=device) / H  # [0, 1]
elevation = elevation * (h_up - h_down) + h_down  # [-30, 10]
azimuth = 1 - torch.arange(W, device=device) / W  # [0, 1]
azimuth = azimuth * (w_left - w_right) + w_right  # [-180, 180]
[elevation, azimuth] = torch.meshgrid([elevation, azimuth], indexing="ij")
angle = torch.stack([elevation, azimuth])[None].deg2rad()

def depth_to_point_map(depth,angle):
    assert depth.dim() == 4
    grid_cos = torch.cos(angle)
    grid_sin = torch.sin(angle)
    grid_x = depth * grid_cos[:, [0]] * grid_cos[:, [1]]
    grid_y = depth * grid_cos[:, [0]] * grid_sin[:, [1]]
    grid_z = depth * grid_sin[:, [0]]
    return torch.cat((grid_x, grid_y, grid_z), dim=1)

depth_for_next = TF.to_tensor(range_image[...,3]).unsqueeze(0)
output_points = depth_to_point_map(depth_for_next,angle)
compare_points = np.fromfile(pcl_path, dtype=np.float32).reshape((-1, 5))

points_for_next = output_points.flatten(2).squeeze().numpy().transpose(1,0)
new_x = np.pad(points_for_next[:,0],(0,compare_points.shape[0]-points_for_next.shape[0]),'constant',constant_values = (0))
new_y = np.pad(points_for_next[:,1],(0,compare_points.shape[0]-points_for_next.shape[0]),'constant',constant_values = (0))
new_z = np.pad(points_for_next[:,2],(0,compare_points.shape[0]-points_for_next.shape[0]),'constant',constant_values = (0))
new_x = np.expand_dims(new_x,axis = 1)
new_y = np.expand_dims(new_y,axis = 1)
new_z = np.expand_dims(new_z,axis = 1)
points_for_next = np.concatenate((new_x,new_y,new_z),axis=1)
intensity_for_next = compare_points[:, [3]]
ring_id_for_next = compare_points[:, [4]]

data_for_next = np.concatenate((points_for_next, intensity_for_next, ring_id_for_next), axis=1)
range_image = load_points_as_images(data_for_next, scan_unfolding=True, H=target_H, W=target_W,min_depth=dmin,max_depth=dmax)

The output is like the follow.

I am confused about which setting is wrong. Could you please point out the error from the output?

Huang-yihao commented 11 months ago

I have revised the code to be somewhat correct. But the range map seems a bit lack information for there are black lines in the range map. Is this the problem of the depth_to_point_map function? That is, the method can only obtain this accuracy from the given point cloud.

# Velodyne HDL-32E
H, W = 32, 512
h_up, h_down = 10.67, -30.67
w_left, w_right = 180, -180
device = "cpu"

elevation = 1 - torch.arange(H, device=device) / H  # [0, 1]
elevation = elevation * (h_up - h_down) + h_down  # [-30.67, 10.67]
azimuth = 1 - torch.arange(W, device=device) / W  # [0, 1]
azimuth = azimuth * (w_left - w_right) + w_right  # [-180, 180]
[elevation, azimuth] = torch.meshgrid([elevation, azimuth], indexing="ij")
angle = torch.stack([elevation, azimuth])[None].deg2rad()

def depth_to_point_map(depth,angle):
    assert depth.dim() == 4
    grid_cos = torch.cos(angle)
    grid_sin = torch.sin(angle)
    grid_x = depth * grid_cos[:, [0]] * grid_cos[:, [1]]
    grid_y = depth * grid_cos[:, [0]] * grid_sin[:, [1]]
    grid_z = depth * grid_sin[:, [0]]
    return torch.cat((grid_x, grid_y, grid_z), dim=1)

depth_for_next = TF.to_tensor(range_image[...,4]).unsqueeze(0)
output_points = depth_to_point_map(depth_for_next,angle)
compare_points = np.fromfile(pcl_path, dtype=np.float32).reshape((-1, 5))

points_for_next = output_points.flatten(2).squeeze().numpy().transpose(1,0)
repeat_radio = int(compare_points.shape[0]/points_for_next.shape[0])+1
new_x = points_for_next[:,0].repeat(repeat_radio)[:compare_points.shape[0]]
new_y = points_for_next[:,1].repeat(repeat_radio)[:compare_points.shape[0]]
new_z = points_for_next[:,2].repeat(repeat_radio)[:compare_points.shape[0]]
new_x = np.expand_dims(new_x,axis = 1)
new_y = np.expand_dims(new_y,axis = 1)
new_z = np.expand_dims(new_z,axis = 1)
intensity_for_next = range_image[:, :, [3]].flatten().repeat(repeat_radio)[:compare_points.shape[0]]
ring_id_for_next = range_image[:, :, [4]].flatten().repeat(repeat_radio)[:compare_points.shape[0]]
intensity_for_next = np.expand_dims(intensity_for_next,axis = 1)
ring_id_for_next = np.expand_dims(ring_id_for_next,axis = 1)

data_for_next = np.concatenate((new_x,new_y,new_z,intensity_for_next,ring_id_for_next),axis=1)
range_image = load_points_as_images(data_for_next, scan_unfolding=None, H=target_H, W=target_W,min_depth=dmin,max_depth=dmax)

kazuto1011 commented 11 months ago

That's because you switched the scan_unfolding to None in the second projection, which ignores the ring IDs. If you want a consistent result, keep the ring IDs to project back, or the spherical projection (scan_unfolding=False) is recommended for both the first and the second projections as many papers, although the line artifact remains. The scan unfolding (scan_unfolding=True) can produce a dense image like raw scan data but the angular spacing is a bit disturbed, which can be inconsistent in re-projection like your example.

[The image is from https://github.com/ltriess/kitti_scan_unfolding]

Besides, both your depth and ring_id are from range_image[..., 4]

depth_for_next = TF.to_tensor(range_image[...,4]).unsqueeze(0)
ring_id_for_next = range_image[:, :, [4]].flatten().repeat(repeat_radio)[:compare_points.shape[0]]

Huang-yihao commented 11 months ago

Thank you for the response!

Since the ring_id of the original point cloud may be inconsistent with the point cloud generated from the range view map. This may lead to terrible results and far away from the range view map generated by the original point cloud. I think maybe using ring_id is not a good choice. What's your suggestion?

I try to use scan_unfolding=False for both the first and the second projections, and can I change the scatter code to the following? I find this may return a fair result as follows. Do you have any suggestions?

def scatter(array, index, value):
for (h, w), v in zip(index, value):
    if np.max(array[h, w])==0:
        array[h, w] = v
    else:
        if h-1>=0:
            if np.max(array[h-1, w])==0:
                array[h-1, w] = v
        if h+1<array.shape[0]: 
            if np.max(array[h+1, w])==0:
                array[h+1, w] = v
return array

kazuto1011 commented 11 months ago

I agree that the ring IDs can be inconsistent before/after projection, but as addressed above, your snippet looks like it does not calculate ring IDs from point clouds; it just flattened the depth values.
Are you trying to infill the missing values? Assigning depth to neighbor angles can make such a dense image, but it looks kind of ad-hoc. The artifacts are caused by a discrepancy between the defined and the actual sensor angles. In my opinion, with the spherical projection, identifying the correct angles per pixel or letting the holes be.

The spherical projection & letting the holes be is the most popular approach, to my understanding.

Huang-yihao commented 11 months ago

Thank you! According to your suggestion, maybe calculating ring id from the point cloud is a better choice for me. Because I need to achieve range view images with no holes according to my research. Then, how can I calculate ring IDs from point clouds? I can only get xyz information from the generated point cloud and achieve intensity and depth information from the range view map and this information seems not directly related to ring id (with my knowledge)?

Huang-yihao commented 11 months ago

I refer to the https://github.com/VincentCheungM/Run_based_segmentation/issues/3#issuecomment-452686184 for the calculation of ring_id, but the method requires the data to be sorted. Since I am not familiar with KITTI data and Nuscenes data, could you please tell me how to sort the point cloud generated by the range map to fit the requirements of the ring_id calculation algorithm?

kazuto1011 commented 11 months ago

In nuScenes, the ring IDs (vertical pixel index) take (0, 1, 2, ..., 31) numbers, and each is paired with an elevation angle $\phi$ of a laser beam. Basically, the ID/angle can only be identified by converting Cartesian coordinates (x, y, z) into spherical coordinates (r, $\phi$, $\theta$); you don't need to sort them. However, the analyzed ring IDs might differ from the original beam IDs depending on sensor calibration. The method you addressed tackles this problem. Fortunately, the officially released points are already sorted in scan order like (0, 0, 0, ..., 31, 31, 31, 31) where each ID group is sorted by azimuth order, so you just have to cut at boundaries and stack them to make range images. This is the scan unfolding (compare it with my code).

In conclusion, if you just have unsorted point clouds without the given IDs, the method has no advantage. Of course, you can compute (r, $\phi$, $\theta$), assign IDs, and finally sort the points, but the result is the same as the simple spherical projection, since the IDs are from the angles.

kazuto1011 commented 11 months ago

The (x, y, z) -> ring ID part is already here:

fup, fdown = np.deg2rad(10.67), np.deg2rad(-30.67)
elevation = np.arcsin(z / depth) + abs(fdown)
grid_h = 1 - elevation / (fup - fdown)
grid_h = np.floor(grid_h * H).clip(0, H - 1).astype(np.int32)

Huang-yihao commented 11 months ago

Thank you for the response. I do not fully understand this https://github.com/kazuto1011/dusty-gan-v2/issues/3#issuecomment-1788512972. But I get the information that, with the scan_unfolding=False, I can get the ring_id by

fup, fdown = np.deg2rad(10.67), np.deg2rad(-30.67)
elevation = np.arcsin(z / depth) + abs(fdown)
grid_h = 1 - elevation / (fup - fdown)
grid_h = np.floor(grid_h * H).clip(0, H - 1).astype(np.int32)

This ring_id can only support me to obtain range map like

There is no better method to obtain a range map without hole with the unsorted points generated by depth_to_point_map function. Is the understanding correct?

kazuto1011 commented 11 months ago

Yes, I think so. Try adjusting the angles [-30.67, 10.67] to make the image a bit better.

Huang-yihao commented 11 months ago

Thank you very much!