zubair-irshad / CenterSnap

Pytorch code for ICRA'22 paper: "Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation"
https://zubair-irshad.github.io/projects/CenterSnap.html
286 stars 47 forks source link

Inference on custom synthetic renders failing #17

Open maturk opened 1 year ago

maturk commented 1 year ago

Hi @zubair-irshad,

I am trying to run some evaluations on my own dataset of renders of Synthetic ShapeNet models (same models you trained on); but, I am failing to run the inference script. It looks like the object detection pipeline fails (heat map outputs) and the resulting point cloud reconstruction and bounding boxes are wrong. The renders contain only a single object in the middle. Here are my input color and depth images: 0_color 0_depth

And here is the output from the inference script:

Peaks_output: 2_peaks_output

Bounding Box output: box3d2

Point cloud projection output: projection2

Let me know if you have any ideas how to get inference to work on these types of synthetic renders. Many thanks!

Matias

zubair-irshad commented 1 year ago

Hi @maturk,

Thanks for your interest in our work. It could be worth looking into the following few things:

  1. What are the camera intrinsic of the shapenet renderings. Do they match or are they close to the camera used to render NOCS synthetic? Please also see FAQ.1 here and if it could be helpful.

  2. In what form is the depth input to the network? I presume your depth is object-centric and not scene-centric and there could be a difference in how we have trained the model and how you might be performing inference. Please see the image below on the scene-depth we use as an input to the model. This could be found under camera_composed_depths here. This is what the original NOCS dataset provided and we perform training/inference in such a way as to reduce the sim2real gap since the real depth is usually scene-centric and not object-centric.

Note that you may train your model on your data from scratch (highly recommended) but since you are interested in zero-shot inference, it would be good to test the model on data which is matching the training distribution.

  1. Which checkpoint are you using to perform inference? Please note that the checkpoints we have released only work for real scenes and maybe sub optimal for synthetic scenes (i.e. in the following notebook)

image

Hope it helps!