zubair-irshad / CenterSnap

Pytorch code for ICRA'22 paper: "Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation"
https://zubair-irshad.github.io/projects/CenterSnap.html
286 stars 47 forks source link

Question regarding latent embedding #11

Open HannahHaensen opened 2 years ago

HannahHaensen commented 2 years ago

Is the latent embedding required?

https://github.com/zubair-irshad/CenterSnap/blob/c2afd120428b0a07c88894da23311995b72bbbfd/prepare_data/generate_data_nocs.py#L202

how to obtain this from a new dataset? do i have to train NOCS?

zubair-irshad commented 2 years ago

Thanks for your interest in our work. To train CenterSnap on your own dataset, yes you need latent embeddings. We obtain them by pre-training an auto-encoder (Please see the snippet below from our paper, Figure 2) image

To pre-train an auto-encoder, we provide additional scripts under external/shape_pretraining. To train this on your own dataset, you would need CAD models (please see the data preparation here). These CAD models should be same as the one used for rendering RGB images (for synthetic) or obtained using any scanning tool for training/finetuning on real dataset (just like NOCS). Also see NOCS object models here.

Note that we do not require CAD models during inference time. It really depends on how you want to train on a new dataset. The shape pre-training stage learns a latent embedding vector per shape so it might be better to train any new CAD models combined with the NOCS synthetic CAD models (assuming they are within the same category as NOCS) so it learns a better prior over all 3D information it sees. Hope it helps!

HannahHaensen commented 2 years ago

thank you for the detailed answer! I will try it

yuanzhen2020 commented 2 years ago

Great work! May i ask the GT latent embedding code for training is generated through the CAD model of training set? And is it possible to generate latent embedding code from partial point cloud (obtained by masked depth).

zubair-irshad commented 2 years ago
  1. Yes, that's correct. We generate the GT latent embedding code by training an auto-encoder using the CAD models available in the training set (Please also see above response for a detailed answer).
  2. Unfortunately it is not possible to use partial pointclouds i.e. masked depth maps directly since it would not give the proper size of the objects which is crucial for obtaining the accurate size information needed for absolute pose information and eventual accurate reconstruction. Please see this and this on how we obtain sizes and rotated bounding boxes from canonical bounding boxes. Note that we use these CAD models and GT latent embedding as a strong prior so we can generalize to unseen images (i.e. we do not require any CAD models during testing, we only require a single RGB-D image during testing) and hence the more accurate the prior is learnt, the better the generalization performance would be.

Hope it helps!

HannahHaensen commented 1 year ago

I have another question regarding this pretrainin in the paper it says you use the shapenet CAD models but here the CAD models from NOCS are used or do I get the README wrong?

zubair-irshad commented 1 year ago

@HannahHaensen, The CAD models used by NOCS are exactly similar to ShapeNet. The difference is, NOCS trains on only 6 Shapenet Categories and selects a subset of ShapeNet models for which they render the synthetic images on table top scenes i.e. their train set. For the paper, we only train on the subset of ShapeNet models (6 categories only) used by NOCS but we have also trained our auto-encoder on all shapenet models (unfortunately we cannot release the pre-trained checkpoints for that). Note that if you wish to, you can train the auto-encoder on all shapenet categories using the same train script. Hope this helps!