Autoencoder Embedding Questions

Kasai2020 commented 2 years ago

When setting up the dataset to train the autoencoder, all of the point clouds are inverted along the z-axis. This can be seen in shape_data.py line 43:
```
model_points = model_points * np.array([[1.0, 1.0, -1.0]])
```
What is the reason for inverting each point cloud here? And then after training the autoencoder and using it to get the embeddings for the dataset, the objects do not appear to be inverted before passing into the autoencoder.
At some points in the code, the latent embedding is divided/multiplied by 100. For example in abs_pose_outputs.py line 59:
```
latent_emb = latent_emb / 100.0
```
Is there a reason why this is done? It seems to be done whenever converting to torch or numpy. Thanks!

zubair-irshad commented 2 years ago

Hi @Kasai2020.

We borrow part of this data processing codebase from object-deformnet. The reason for this I believe is to have all the CAD models in canonical coordinates which when multiplied with their ground truth absolute pose and size should transform the models to their world frame. Please see this to transform the predicted/GT point clouds to the world frame and visualize for better understanding. For a new dataset, you don't have to make this conversion as long as your transformed pointclouds and GT canonical pointclouds are related with the rotation, translation and scale matrices you are using.
This is a hyperparameter. We tweaked this manually to get the ground truth in a reasonable range for the neural network to regress. Note that during inference, we again multiply by 100 to get the actual value of the embedding which is passed to the decoder. Hope it helps!

Kasai2020 commented 2 years ago

Thank you again for the fast reply. That makes a lot of sense!

zubair-irshad / CenterSnap