Inference on custom synthetic renders failing

Hi @maturk,

Thanks for your interest in our work. It could be worth looking into the following few things:

What are the camera intrinsic of the shapenet renderings. Do they match or are they close to the camera used to render NOCS synthetic? Please also see FAQ.1 here and if it could be helpful.
In what form is the depth input to the network? I presume your depth is object-centric and not scene-centric and there could be a difference in how we have trained the model and how you might be performing inference. Please see the image below on the scene-depth we use as an input to the model. This could be found under camera_composed_depths here. This is what the original NOCS dataset provided and we perform training/inference in such a way as to reduce the sim2real gap since the real depth is usually scene-centric and not object-centric.

Note that you may train your model on your data from scratch (highly recommended) but since you are interested in zero-shot inference, it would be good to test the model on data which is matching the training distribution.

Which checkpoint are you using to perform inference? Please note that the checkpoints we have released only work for real scenes and maybe sub optimal for synthetic scenes (i.e. in the following notebook)

Hope it helps!

zubair-irshad / CenterSnap

Inference on custom synthetic renders failing #17