zubair-irshad / shapo

Pytorch code for ECCV'22 paper. ShAPO: Implicit Representations for Multi-Object Shape, Appearance and Pose Optimization
Other
181 stars 10 forks source link

About pretrained models for inference #9

Closed dongho-Han closed 1 year ago

dongho-Han commented 1 year ago

Hello. First, thank you for the answers in #7 . I could run 'inference' and 'optimize' code with the uploaded pretrained weights. For further evaluation, I have a few questions.

I think that sdf_latentcode, sdf_rgb_net_weights, and trained weights(shapo_real.ckpt) are needed.

  1. Can you explain the weights' meaning as I written above?

  2. If I run the 'train' phase as you written above, I think I can only train the trained weights(shapo_real.ckpt). Is that right? If true, where the sdf_latentcode, sdf_rgb_net_weights come from?

  3. To run the inference code on all the NOCS dataset(Real test dataset, Real train dataset), can I use all the pretrained weights written above?

  4. To run the inference code on other datasets, can I use all the pretrained weights written above? If not, which weights do I have to change?

I'm really appreciated with your project!

zubair-irshad commented 1 year ago

Hi @dongho-Han, Glad that you were able to run inference and optimization using our code.

  1. sdf Latent codes and sdf_rgb_net_weights are pre-tained checkpoints for the shape and appearance part of the network respectively (Note that we do this pretraining in advance of training the network on RGB-D observations and this pretaining acts as a strong prior for our model) whereas shapo_real.ckpt is the pretrained weights of the network trained with just RGB-D observations. For supervision and inference (i.e. shape and appearance decoding), we freeze the sdf and rgb weights of our network since it is pretrained beforehand on 3D textured CAD models and has a strong 3D prior.

  2. Correct. We do not release the training of our shape and appearance latent codes part and only release the shape and appearance pretrained networks. If you are interested, I can help you reproduce this part as much as possible (Do you mind creating a separate thread/issue for it?)

  3. Correct, you can use these pretrained weights to perform inference/optimization on NOCS real as we already show in our google colab.

  4. Our network generalizes reasonably well to new RGB-D observations within the same category. Cf. see Zero-shot results on HSR Robot on our website. Although we found that in order to get good performance, the camera specs for the captured RGB-D observations should match the intrinsic here. This could be done by using the same camera intrinsics to capture the data or do some post processing to warp the image slightly to match these intrinsic. If you wish to avoid this, we highly recommend finetuning the model with your own real small subset of data in a supervised way as we mentioned here (Number 3.)

For new categories, I highly recommend training the model from scratch on the new dataset if the dataset or categories are different from NOCS Real. Essentially, you will only have to train the shape and appearance network as well as our ShAPO model for any custom categories or dataset.

Hope it helps!