zc-alexfan / arctic

[CVPR 2023] Official repository for downloading, processing, visualizing, and training models on the ARCTIC dataset.
https://arctic.is.tue.mpg.de
Other
301 stars 18 forks source link

Intrinsics in visualizer #19

Closed ap229997 closed 1 year ago

ap229997 commented 1 year ago

For visualizing mesh by projecting them onto the image, shouldn't the intrinsics in https://github.com/zc-alexfan/arctic/blob/f91ca2b16f02c4f196ae2b99cf21f5d81486ce45/scripts_method/visualizer.py#L85-L88 be taken from meta_info in predictions rather than hard coding here?

zc-alexfan commented 1 year ago

The one being hard-coded is for the models. We follow HMR and other hand/body regressors to use a fixed focal length (1000.0) in this case with a weak perspective camera.

You can also use the groundtruth intrinsics in meta info, but you might need to re-train the model. Furthermore, we tried to train with groundtruth intrinsics for the MANO parameter regression, but it is rather unstable. I think the problem was that, the model has to then adapt to different focal length changes due to the cropping and resizing effect using different bounding boxes.

ap229997 commented 1 year ago

Also, why is ground truth intrinsics used during evaluation on egocentric images (why not hard coded values or using ground truth for allocentric images as well?) https://github.com/zc-alexfan/arctic/blob/f91ca2b16f02c4f196ae2b99cf21f5d81486ce45/src/datasets/arctic_dataset.py#L406-L410

zc-alexfan commented 1 year ago

I wrote this last year. I remember the reason was because an intrinsics matrix with use_gt_k=False will cause large misalignment when meshes are projected onto the pixel space for egocentric setting. This is because weak perspective camera assumes far away meshes, but for egocentric setting it is close up.

Therefore, we use the groundtruth intrinsics here so that the 2d projection of the meshes on the images will be more aligned.

Now, since we are using groundtruth intrinsics here, we need to make sure the intrinsics does not change for feasibility in model training. Therefore, we do not apply scaling in data augmentation, which is why augm_dict["sc"] = 1.0. Since image scale does not change, the intrinsics will not change.