Inconsistency in HO3D Data's Prediction

Thank you so much for this amazing work!

I see that you do not use 3D Annotations provided, but use COLMAP estimations for HO3D objects. I wanted to know if I want to utilize HO3D Annotations (mainly from meta pickle files), how can I utilize them to create dataset.

Below: Camera Coordinate System assumes camera at origin facing along -Z direction Just for reference: HO3D data's meta pickle file contains:

beta, pose, trans of hand in Camera Coordinate System
Object Rotation and Translation in Camera Coordinate System
Camera Intrinsic Matrix

We also have HO3D segmentations that can be directly used as masks.

I am just trying to figure out what transformations in hand MANO parameters or Object Transformation parameters I need to do to be able to generate data.npy and train HOLD.

Edit: I tried visualizing data.npy of ShSu10, to see the canonical space. I find that canonical camera locations seem very inconsistent (1. with given original HO3D's models) and (2. among each other). Following are the camera locations with respect to object in Canonical Space.

Screenshot from 2024-07-31 18-59-08

Moreover, I choose 2 frames where I would expect camera views to be facing opposite to each other: 0000 0129

However, I get these view directions of camera's principle axis in canonical space. Screenshot from 2024-07-31 19-08-08

How I get canonical space camera:

get_camera_param() gives 'cam_loc' and 'ray_dirs' in deformed space. Using 'tf_mat' in deformer with "inverse=True" should give camera in canonical space.

I also tried this:

I tried updating data.npy in the following way:

Update all scale_mat_i to identity matrix
Update all world_mat_i to camera intrinsic values, with 3rd column 0.
Update normalize_shift to [0,0,0]
Update entity.right's hand pose, trans and mean_shape from the given HO3D's ShSu10's meta pickle files
Update entity.object's parameters
1. set obj_scale as 1.0
2. set norm_mat as identity
3. set object_poses from objRot and objTrans in meta pickle files.

When I train with this information, even though camera visualization, etc. looks good, I probably encounter nan weights after 1st iteration, and the training breaks in 2nd iteration. The warning in 1st iteration is: FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_grad_norm_; continuing anyway. Note that the default behavior will change in a future release to error out if a non-finite total norm is encountered. At that point, setting error_if_nonfinite=false will be required to retain the old behavior.

Not sure what is wrong.

zc-alexfan / hold

Inconsistency in HO3D Data's Prediction #17

How I get canonical space camera:

I also tried this: