val-iisc / 3d-lmnet

Repository for 3D-LMNet: Latent Embedding Matching for Accurate and Diverse 3D Point Cloud Reconstruction from a Single Image [BMVC 2018]
https://val-iisc.github.io/3d-lmnet/
MIT License
113 stars 24 forks source link

Confusions about the result of PSGN #1

Closed xiaomingjie closed 6 years ago

xiaomingjie commented 6 years ago

Hi

A few papers about point clouds reconstruction gave their result on multi categories. DeformNet (here) gave their results and their experiment on PSGN. They got result of PSGN like 0.13 (CD) while you got 0.05 (Table 3 in your paper).

I've thought this before. PSGN originally used multi categories to train their network and DeformNet use PSGN's network directly to test certain single category. I assume that you use single category to train PSGN network and use it to test that category. So DeformNet's result of PSGN is much higher than yours.

But Dense 3D Object Reconstruction (here) shows a result of PSGN like 0.028 (CD of airplane) instead of your 0.037 (CD of airplane). I also implement PSGN several times with several methods and got similar result (0.028).

I suppose that your paper didn't mention that how did you get your result of PSGN or your normalization about your point cloud and so on.

My confusions are here and looking forward to your reply.

priyankamandikal commented 6 years ago

Hi,

Point cloud reconstruction metrics are very sensitive to the dimension, scale and orientation of the point clouds that are being compared. Unless there is a standardized method for computing the metrics, we generally cannot compare numbers across papers due to different evaluation strategies that are used.

In our paper, we explain our evaluation methodology in Section 4 - 'Experiments' under 'Evaluation Methodology'. To restate, we randomly sample 1024 points from the point cloud and re-scale the point cloud within a bounding box of length 1 unit. We perform this operation on both the predicted as well as ground truth point clouds. We then apply the Iterative Closest Point algorithm (ICP) for finer alignment of the prediction and ground truth. This last step is to ensure a fair evaluation of PSGN which predicts rotated point clouds and hence usually has a little misalignment even after angle correction. All point clouds finally have a fixed canonical frame of reference. Chamfer and EMD metrics are computed once this alignment is done.

Regarding PSGN results, we directly evaluate on the pre-trained models provided by the authors. So the setting in Table 3 of our paper is multi-category trained models evaluated on single categories. The numbers across papers are merely different due to different evaluation strategies. Also note that DeformNet reports numbers after training on 5 categories, while we train on all 13 categories of the r2n2 dataset.

We are in the process of releasing our training code and pre-trained models. We will also release our evaluation code in the coming days.

xiaomingjie commented 6 years ago

I see.

PSGN normalized the radius of each model's bounding hemi-sphere to unit 1 and aligned their ground plane. PSGN defined the unit 1 here as 1/10 of the 3D grid (32*32*32) which means 1 unit = 3.2. DeformNet and Dense 3D Object Reconstruction did the same normalization to make their result comparable with PSGN.

So I suppose that your unit 1 is just 1 here which means 1 unit = 1 in your paper. That makes sense here. I think it would be better to state your normalization strategies and source of the result of PSGN in your paper.

As for your saying 'we train on all 13 categories of the r2n2 dataset', Can I explain that your network is also multi-category trained model then evaluated on every single category?

priyankamandikal commented 6 years ago

Yes, our network is trained on 13 categories together and evaluated on every single category.

xiaomingjie commented 6 years ago

One more request here. Could you release your dataset? You said that you use 3D-R2N2's dataset but as far as I know that 3D-R2N2 did not have point clouds since it focused on voxels.

priyankamandikal commented 6 years ago

The input images are directly taken from the 3d-r2nr2 dataset. For the ground truth point clouds, we just sample points on the surfaces of the mesh models provided by ShapeNet. You can use any point cloud processing library for the purpose. Nevertheless, we will provide the ground truth point clouds when we release the evaluation code.

xiaomingjie commented 6 years ago

Thanks very much.

xiaomingjie commented 6 years ago

Sorry to bother you again. You said that you will provide the ground truth point clouds before but now due to memory restrains you give ShapeNet meshes instead. I totally understand that but could you release your point clouds generation method from meshes and how to package your rendered images, point clouds together to get training started? I supposed that here isn't clear enough to use your dataset to start training.