Prediction seems inaccurate in Embodied Object Understanding 😭

qizekun / ShapeLLM

[ECCV 2024] ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

https://qizekun.github.io/shapellm/

Apache License 2.0

140 stars 10 forks source link

Prediction seems inaccurate in Embodied Object Understanding 😭 #19

Open Bokai-Ji opened 1 month ago

Bokai-Ji commented 1 month ago

Thank you for the excellent work! I encountered problem that I can't get ideal prediction for Embodied Object Understanding.

Here is an example of the prediction:

USER: Where I can grasp to open the upper drawer?
ASSISTANT: To open the upper drawer, you can grasp the round fixed handle at the bounding box [[-0.55, -0.07, 0.45], [-0.55, -0.1, 0.45], [-0.55, -0.1, 0.42], [-0.55, -0.07, 0.42], [-0.51, -0.07, 0.45], [-0.51, -0.1, 0.45], [-0.51, -0.1, 0.42], [-0.51, -0.07, 0.42]].

The rendered result is shown in the figure above, where the predicted bounding box is far from the ground-truth position. I tried several objects in PartNet-Mobility Dataset and none of the predictions are even close to the ground-truth. Is this caused by the mismatch of the axes of the point clouds and the predicted bounding boxes? Currently I'm using the preprocessing code in mm_utils.py, which is

def pc_norm(pc):
    """ pc: NxC, return NxC """
    centroid = np.mean(pc, axis=0)
    pc = pc - centroid
    m = np.max(np.sqrt(np.sum(pc ** 2, axis=1)))
    if m < 1e-6:
        pc = np.zeros_like(pc)
    else:
        pc = pc / m
    return pc

Appreciate for any support!

qizekun commented 1 month ago

Hello, thank you for your interest in our work!

First, since we applied normalization during both training and inference, our output bounding boxes are also in the normalized space. You might want to check if the visualization code accounts for this normalization.

Second, the current grounding performance is still in a very early stage, as you can observe from the low values in Table 6 of the paper.

Bokai-Ji commented 1 month ago

Thank you for your quick response!!

I would like to clarify whether I have performed the normalization correctly. I referred to the code in $ShapeLLM/llava/serve/cli.py for loading and normalizing point clouds (process_pts). The process_pts() function first applies random_sample() to the point clouds and then uses pc_norm() for normalization.

Below is my testing code:

pts = np.load("textured_objects/47254/47254.npy")
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(pc_norm(pts[:,:3]))
pcd.colors = o3d.utility.Vector3dVector(pts[:,3:])
o3d.visualization.draw_geometries([pcd, line_set])

In this code, I simply used pc_norm() to normalize the point clouds after loading them from the .npy file. Am I doing this correctly?

Thank you for your help!