slothfulxtx / MBPTrack3D

[ICCV2023] MBPTrack: Improving 3D Point Cloud Tracking with Memory Networks and Box Priors
24 stars 6 forks source link

Skewed predictions #6

Closed MaxTeselkin closed 11 months ago

MaxTeselkin commented 1 year ago

Hi! I am running MBPTrack on custom tracklet (which is basically a part of KITTI dataset), predictions look close to reality, but they are a little bit skewed (I have attached an image below - as you can see, model's prediction is shifted to the left and is located lower than correct cuboid). Is it normal model performance or I am probably doing something wrong?

image

My code for inference is almost the same as your code from here:

def predict(
        self,
        frames,
    ):
        torch.set_grad_enabled(False)
        pred_bboxes = []
        memory = None
        lwh = None
        last_bbox_cpu = np.array([0.0, 0.0, 0.0, 0.0])

        for frame_idx, frame in enumerate(frames):
            if frame_idx == 0:
                base_bbox = frame["bbox"]
                lwh = np.array([base_bbox.wlh[1], base_bbox.wlh[0], base_bbox.wlh[2]])
            else:
                base_bbox = pred_bboxes[-1]
            pcd = crop_and_center_pcd(
                frame["pcd"],
                base_bbox,
                offset=self.cfg.dataset_cfg.frame_offset,
                offset2=self.cfg.dataset_cfg.frame_offset2,
                scale=self.cfg.dataset_cfg.frame_scale,
            )
            if frame_idx == 0:
                if pcd.nbr_points() == 0:
                    pcd.points = np.array([[0.0], [0.0], [0.0]])
                bbox = transform_box(frame["bbox"], base_bbox)
                mask_gt = get_pcd_in_box_mask(pcd, bbox, scale=1.25).astype(int)
                # bbox_gt = np.array([bbox.center[0], bbox.center[1], bbox.center[2], (
                #     bbox.orientation.degrees if self.cfg.dataset_cfg.degree else bbox.orientation.radians) * bbox.orientation.axis[-1]])
                pcd, idx = resample_pcd(
                    pcd, self.cfg.dataset_cfg.frame_npts, return_idx=True, is_training=False
                )
                mask_gt = mask_gt[idx]
            else:
                if pcd.nbr_points() <= 1:
                    bbox = get_offset_box(
                        pred_bboxes[-1],
                        last_bbox_cpu,
                        use_z=self.cfg.dataset_cfg.eval_cfg.use_z,
                        is_training=False,
                    )
                    pred_bboxes.append(bbox)
                    continue
                pcd, idx = resample_pcd(
                    pcd, self.cfg.dataset_cfg.frame_npts, return_idx=True, is_training=False
                )
            embed_output = self.model(
                dict(
                    pcds=torch.tensor(pcd.points.T, device=self.device, dtype=torch.float32)
                    .unsqueeze(0)
                    .unsqueeze(0)
                ),
                mode="embed",
            )
            xyzs, geo_feats, idxs = (
                embed_output["xyzs"],
                embed_output["feats"],
                embed_output["idxs"],
            )

            if frame_idx == 0:
                first_mask_gt = torch.tensor(
                    mask_gt, device=self.device, dtype=torch.float32
                ).unsqueeze(0)
                propagate_output = self.model(
                    dict(
                        feat=geo_feats[:, 0, :, :],
                        xyz=xyzs[:, 0, :, :],
                        first_mask_gt=torch.gather(first_mask_gt, 1, idxs[:, 0, :]),
                    ),
                    mode="propagate",
                )
                layer_feats = propagate_output["layer_feats"]
                update_output = self.model(
                    dict(
                        layer_feats=layer_feats,
                        xyz=xyzs[:, 0, :, :],
                        mask=torch.gather(first_mask_gt, 1, idxs[:, 0, :]),
                    ),
                    mode="update",
                )
                memory = update_output["memory"]
                pred_bboxes.append(frame["bbox"])
            else:
                propagate_output = self.model(
                    dict(memory=memory, feat=geo_feats[:, 0, :, :], xyz=xyzs[:, 0, :, :]),
                    mode="propagate",
                )
                geo_feat, mask_feat = propagate_output["geo_feat"], propagate_output["mask_feat"]
                layer_feats = propagate_output["layer_feats"]

                localize_output = self.model(
                    dict(
                        geo_feat=geo_feat,
                        mask_feat=mask_feat,
                        xyz=xyzs[:, 0, :, :],
                        lwh=torch.tensor(lwh, device=self.device, dtype=torch.float32).unsqueeze(0),
                    ),
                    mode="localize",
                )
                mask_pred = localize_output["mask_pred"]
                bboxes_pred = localize_output["bboxes_pred"]
                bboxes_pred_cpu = bboxes_pred.squeeze(0).detach().cpu().numpy()

                bboxes_pred_cpu[np.isnan(bboxes_pred_cpu)] = -1e6

                best_box_idx = bboxes_pred_cpu[:, 4].argmax()
                bbox_cpu = bboxes_pred_cpu[best_box_idx, 0:4]
                if torch.max(mask_pred.sigmoid()) < self.cfg.missing_threshold:
                    bbox = get_offset_box(
                        pred_bboxes[-1],
                        last_bbox_cpu,
                        use_z=self.cfg.dataset_cfg.eval_cfg.use_z,
                        is_training=False,
                    )
                else:
                    bbox = get_offset_box(
                        pred_bboxes[-1],
                        bbox_cpu,
                        use_z=self.cfg.dataset_cfg.eval_cfg.use_z,
                        is_training=False,
                    )
                    last_bbox_cpu = bbox_cpu

                pred_bboxes.append(bbox)
                if frame_idx < len(frames) - 1:
                    update_output = self.model(
                        dict(
                            layer_feats=layer_feats,
                            xyz=xyzs[:, 0, :, :],
                            mask=mask_pred.sigmoid(),
                            memory=memory,
                        ),
                        mode="update",
                    )
                    memory = update_output["memory"]
        return pred_bboxes
MaxTeselkin commented 1 year ago

I also tried to change some of model parameters like cfg.dataset_cfg.frame_offset, cfg.dataset_cfg.frame_npts, etc., but it didn't solve my problem

slothfulxtx commented 12 months ago

Can you visualize the input point clouds of MBPTrack instead of raw data? Maybe something wrong happens during preprocessing or post-processing. For example, you can visualize pcds

embed_output = self.model(
                dict(
                    pcds=torch.tensor(pcd.points.T, device=self.device, dtype=torch.float32)
                    .unsqueeze(0)
                    .unsqueeze(0)
                ),
                mode="embed",
            )

with its mask_gt or mask_pred, as well as the pred_bboxes.

MaxTeselkin commented 12 months ago

@slothfulxtx I tried visualizing input data and predictions using script proposed in CXTrack. I had to change it, because I am working via remote ssh + devcontainer in VS Code, and I receive typical for open3d error with OpenGL. So I had to switch to another open3d visualization option based on plotly:

input_pcd = input_frame["pcd"]
vis_pcd = open3d.geometry.PointCloud()
point_xyz = input_pcd.points.T
vis_pcd.points = open3d.utility.Vector3dVector(point_xyz)
open3d.visualization.draw_plotly_server([vis_pcd], width=1920, height=1080)

It worked for me. Here is the result of my input pointcloud visualization:

image image

And here is how original pointcloud should look like:

image image
MaxTeselkin commented 12 months ago

As you can see from the images above, my pointcloud looks quite elongated in comparison to original. Pay attention to the cars - after converting my pointcloud to your format they look very deformated (they are very high).

Here is how I convert my pointclouds to your format:

from datasets.utils import PointCloud

pcd = open3d.io.read_point_cloud(os.path.join(self.pc_dir, cloud_info.name))
points = np.asarray(pcd.points, dtype=np.float32)
pcd = PointCloud(points.T)

My pointclouds are stored in .pcd format. What am I doing wrong? Why does my pointcloud looks so deformated after converting it to your PointCloud class? Is it necessary to apply some additional transformations for pointcloud to be transformed correctly to your format? @slothfulxtx

MaxTeselkin commented 12 months ago

One more important fact - I made an experiment to test my scripts for converting input bounding boxes from my format to your format and vice versa. I converted bounding box from my format to yours and after that converted results back from your format to mine and visualized the result. It looks exactly the same as my input bounding box. So when it comes to scripts for converting boxes from one format to another - my scripts work fine. I think the problem originates from incorrect pointcloud transformation. If open3d and plotly visualized my pcd correctly, it is deformated after converting to your PointCloud class - I think that this is the reason why model performs so bad on my data.

MaxTeselkin commented 12 months ago

You can download my input pointcloud here.

Here is code for converting (incorrect?) to your PointCloud class:

import open3d
from datasets.utils import PointCloud

pcd = open3d.io.read_point_cloud(path_to_pointcloud)
points = np.asarray(pcd.points, dtype=np.float32)
pcd = PointCloud(points.T)
vis_pcd = open3d.geometry.PointCloud()
point_xyz = pcd.points.T
vis_pcd.points = open3d.utility.Vector3dVector(point_xyz)
open3d.visualization.draw_plotly_server([vis_pcd], width=1920, height=1080)
MaxTeselkin commented 12 months ago

If you can guide my on how to convert .pcd file to your PointCloud class correctly - I will be very grateful

slothfulxtx commented 11 months ago

@MaxTeselkin Hi, i've just load your 23.pcd with the following code. It seems that the skewed visualization is caused by incorrect function call of draw_plotly_server([vis_pcd], width=1920, height=1080). I'm not sure about that because i cannot call this function on my computer, thus i use the original visualization method in CXTrack.

import open3d
from point_cloud import PointCloud
import numpy as np

pcd = open3d.io.read_point_cloud('./23.pcd')
points = np.asarray(pcd.points, dtype=np.float32)
pcd = PointCloud(points.T)
vis_pcd = open3d.geometry.PointCloud()
point_xyz = pcd.points.T
point_color = np.tile(np.array((0, 0, 1.0)), (point_xyz.shape[0], 1))
vis_pcd.points = open3d.utility.Vector3dVector(point_xyz)
vis_pcd.colors = open3d.utility.Vector3dVector(point_color)

vis = open3d.visualization.VisualizerWithKeyCallback()
vis.create_window(width=1920, height=1080)
vis.get_render_option().load_from_json("./open3d_render_option.json")

vis.add_geometry(vis_pcd)
vis.run()

image

slothfulxtx commented 11 months ago

Besides, here's the version of open3d on my PC, maybe it could be helpful. image I receive this bug when using your visualization script image

MaxTeselkin commented 11 months ago

@slothfulxtx Yeah, draw_plotly_server was added in newer open3d versions (I have 0.17.0). The problem was caused by the fact that the same rotation matrix is interpreted differently by MBPTrack and my company's internal tool for working with pointclouds. I had to add np.pi / 2 to the third coordinate of rotation vector to make it work correctly. Do you rotate BoundingBox class somehow?

MaxTeselkin commented 11 months ago

Here is my full visualization script based on plotly to show pointcloud and bounding box - it helped me to find out the cause of the problem:

import open3d as o3d
import numpy as np
import plotly.graph_objects as go
from datasets.utils import BoundingBox
from scipy.spatial.transform import Rotation
from pyquaternion import Quaternion

def extract_xyz(pcd_file):
    # Load the point cloud
    pcd = o3d.io.read_point_cloud(pcd_file)

    # Convert to numpy array
    points = np.asarray(pcd.points)

    # Extract x, y and z
    x = points[:, 0]
    y = points[:, 1]
    z = points[:, 2]

    return x, y, z

def get_box_vertexes(center, size, rotation):
    rot = Rotation.from_rotvec(rotation)
    rot_mat = rot.as_matrix()
    orientation = Quaternion(matrix=rot_mat)
    bbox = BoundingBox(center, size, orientation)

    return bbox.corners()

def visualize_with_plotly(pcd_x, pcd_y, pcd_z, box_x, box_y, box_z):
    fig = go.Figure(data=[
    go.Scatter3d(
        x=pcd_x,
        y=pcd_y,
        z=pcd_z,
        mode='markers',
        marker=dict(
            size=2,
            opacity=0.8
        )),
    go.Mesh3d(
        x=box_x,
        y=box_y,
        z=box_z,
        i = [7, 0, 0, 0, 4, 4, 6, 6, 4, 0, 3, 2],
        j = [3, 4, 1, 2, 5, 6, 5, 2, 0, 1, 6, 3],
        k = [0, 7, 2, 3, 6, 7, 1, 1, 5, 5, 7, 6],
        opacity=0.3,
        color='#DC143C',
        flatshading = True
    ),
    ])

    fig.update_layout(
        margin=dict(l=0, r=0, b=0, t=0),
        scene=dict(
            xaxis_title='X',
            yaxis_title='Y',
            zaxis_title='Z',
            xaxis=dict(range=[-100, 100]),
            yaxis=dict(range=[-80, 80]),
            zaxis=dict(range=[-30, 30])
        ),
    )

    fig.show()

if __name__ == "__main__":
    file_path = "./pointclouds/0000000023.pcd" # your pcd path
    pcd_x, pcd_y, pcd_z = extract_xyz(file_path)
    box_x, box_y, box_z = get_box_vertexes(
        center=[19.788434758363664, 8.789811614313502, -0.798320147505247], # your bbox center coordinates
        size = [1.6056261, 3.8312221, 1.8019634], # your bbox width, length and height
        rotation=[0, 0, 3.14] # your bbox rotation vector - x (pitch), y (roll), z (yaw)
    )
    visualize_with_plotly(pcd_x, pcd_y, pcd_z, box_x, box_y, box_z)
MaxTeselkin commented 11 months ago

Also, you were right about the fact that deformated pointcloud visualization was caused by plotly. I had to increase x, y and z axes limits and after that pointcloud started looking normally

slothfulxtx commented 11 months ago

For the bounding box rotation problem, the following codes may be helpful to understand our coordinate system. https://github.com/slothfulxtx/MBPTrack3D/blob/e5670d122a8fc129189ef826ec470bbeb51eb5ef/datasets/kitti_mem.py#L211-L219

MaxTeselkin commented 11 months ago

@slothfulxtx As I have mentioned above, I solved the problem by modifying rotation vector the following way: align bounding box to pitch (x) and roll (y) - set them to 0, rotate on 90 degrees (your bbox yaw (z) = your bbox yaw (z) + (np.pi / 2). After that predictions started looking good. Regarding piece of code that you sent me above - as far as I understood, this code is used to convert data from KITTI format to format required by MBPTrack. But I convert data not from KITTI format. I convert data from my company's specific format - that's why this code snippet will probably not be useful.

MaxTeselkin commented 11 months ago

I have solved my problem so I am closing the issue