zc-alexfan / hold

[CVPR 2024✨Highlight] Official repository for HOLD, the first method that jointly reconstructs articulated hands and objects from monocular videos without assuming a pre-scanned object template and 3D hand-object training data.
https://zc-alexfan.github.io/hold
MIT License
271 stars 7 forks source link

Inconsistency in HO3D Data's Prediction #17

Open alakhag opened 1 month ago

alakhag commented 1 month ago

Thank you so much for this amazing work!

I see that you do not use 3D Annotations provided, but use COLMAP estimations for HO3D objects. I wanted to know if I want to utilize HO3D Annotations (mainly from meta pickle files), how can I utilize them to create dataset.

Below: Camera Coordinate System assumes camera at origin facing along -Z direction Just for reference: HO3D data's meta pickle file contains:

We also have HO3D segmentations that can be directly used as masks.

I am just trying to figure out what transformations in hand MANO parameters or Object Transformation parameters I need to do to be able to generate data.npy and train HOLD.

Edit: I tried visualizing data.npy of ShSu10, to see the canonical space. I find that canonical camera locations seem very inconsistent (1. with given original HO3D's models) and (2. among each other). Following are the camera locations with respect to object in Canonical Space.

Screenshot from 2024-07-31 18-59-08

Moreover, I choose 2 frames where I would expect camera views to be facing opposite to each other: 0000 0129

However, I get these view directions of camera's principle axis in canonical space. Screenshot from 2024-07-31 19-08-08

How I get canonical space camera:

get_camera_param() gives 'cam_loc' and 'ray_dirs' in deformed space. Using 'tf_mat' in deformer with "inverse=True" should give camera in canonical space.

I also tried this:

I tried updating data.npy in the following way:

When I train with this information, even though camera visualization, etc. looks good, I probably encounter nan weights after 1st iteration, and the training breaks in 2nd iteration. The warning in 1st iteration is: FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_grad_norm_; continuing anyway. Note that the default behavior will change in a future release to error out if a non-finite total norm is encountered. At that point, setting error_if_nonfinite=false will be required to retain the old behavior.

Not sure what is wrong.