Closed guangkaixu closed 2 years ago
Hi guangkaixu, can you share the mesh for debugging? I suspect that the mesh you're using is not scaled to 64^3 grid which these scripts expect.
Hi guangkaixu, can you share the mesh for debugging? I suspect that the mesh you're using is not scaled to 64^3 grid which these scripts expect.
Oh yes, I performed the evaluation code on the ScanNet and NYUDepthv2 datasets. The voxel size is set to 2cm and the grid varies between different scenes, instead of 64^3. Is there any suggestion for evaluation on different grid without scaling? I think evaluation on 64^3 grid seems to be coarse. Thanks above.
I see. Then maybe you can use the original scripts from convocc https://github.com/autonomousvision/convolutional_occupancy_networks/blob/master/src/eval.py that i had modified.
Since these metrics are based on randomly sampled points you might not get absolute 0 and 1 but the metrics should tend to 0 and 1 as number of points get large. The default number of points is already quite large so you should get numbers very close to 0 for CD and 1 for normals. In case you still have issues, i can try debugging it at my end it you provide me with the mesh.
Thanks for offering the original evaluation code for 3D mesh. I tried to modify the original one but got stuck on getting all points from mesh. If I follow your evaluation code to sample 100k points from mesh(total about 500k points), the points are different between each sampling, which will lead to errors in metrics. Is there any method to get all points from mesh except pointcloud = trimesh.sample(sample_num)
? The evaluation code and demo mesh are released in my repo (https://github.com/guangkaixu/eval_3d_mesh) and thanks again for your support.
Hi @guangkaixu, sorry for the late response, got busy with some deadlines. I had a chance to look at your mesh. With regards to your script, I guess you already discovered the reason why you get higher chamfer L1 and low NC. You have quite a big mesh with 255K vertices, and since both CD and NC are based on random samples, the quality of the metric depends on how many points you sample. E.g. extreme case, if you sample just 1 point on both GT and predected meshes, you'd get a really bad normal consistency and high chamfer distance, because this one point can be at one place in GT and at another place in pred. The metric gets more and more reliable the more samples you use. For your example, I get
{'normals completeness': 0.9103680471111619, 'chamfer-L1': 0.005938444974643876}
for original number of points
{'normals completeness': 0.9770087988483396, 'chamfer-L1': 0.0013580619865905396}
for N=1e7 points
As you see it gets better with more samples. However, increasing the number of samples comes at the cost of more memory and time.
Also you have to be careful with the IoU metric. In your code you're using pitch=1,1875
even though the scale of your mesh is somewhere around 5 units. This pitch was for a res of 64, so maybe you should try to scale the pitch to your mesh resolution (best way is to visualize the resulting voxels, if they look reasonable).
Thank you for your patient explanation. I found evaluating point cloud instead of mesh will be more reliable without sampling, but your suggestion is also useful for me!
Hi, thanks for your work on reconstruction. I'm interested in the evaluation for mesh, but when I perfoming evaluation with retrieval-fuse/util/mesh_metrics.py (https://github.com/nihalsid/retrieval-fuse/blob/fce90fa6adf349a3c7bb5eb4b57d387d4f6ff46c/util/mesh_metrics.py) on GT mesh, i.e. the same mesh for prediction and target, the chamferL1 is 0.017 and the normals_correctness is 0.801, which should be 0 and 1 theoretically to my understanding. What should I do to get the correct 3d mesh evaluation results?