shangbuhuan13 / SO-Pose

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation
Apache License 2.0
67 stars 10 forks source link

question about implementation of 2D cross layer consistency #11

Closed RuyiLian closed 2 years ago

RuyiLian commented 2 years ago

Hi,

Thanks again for your great work. I have a question about the implementation of 2D consistency loss https://github.com/shangbuhuan13/SO-Pose/blob/a3a61d2c97b1084a4754d6c12e45e16d85809729/core/gdrn_selfocc_modeling/losses/crosstask_projection_loss.py#L158

I am confused why the loss is divide by 572.3. In datasets/BOP_DATASETS/lm/camera.json I see the camera information is

{
  "cx": 325.2611,
  "cy": 242.04899,
  "depth_scale": 1.0,
  "fx": 572.4114,
  "fy": 573.57043,
  "height": 480,
  "width": 640
}

Also, will this impact YCBV dataset, since it has different camera intrinsic parameters? Thanks!

shangbuhuan13 commented 2 years ago

Thanks for your question. It is used to balance the weights of loss terms. And it's the camera focal length. So you can adjust it accordingly.

shangbuhuan13 commented 2 years ago

But I forgot whether I changed this parameter. I think it may not affect the final result a lot

RuyiLian commented 2 years ago

Thanks for your reply!

RuyiLian commented 2 years ago

Sorry to bother you again. Could you give the intuition for using 1/f as the weight? I could not find the explanation in the paper (maybe I just missed it). Thanks!

shangbuhuan13 commented 2 years ago

I forgot the details. But usually, I use the weights to balance each term so that their initial ranges are similar and they can all take effect during training.

shangbuhuan13 commented 2 years ago

I use 1/f may be because that the 2D projections are measured on the image, but other losses are defined in 3D space. So I use 1/f to balance the terms. According to the camera projection theory, Zp=KP, so p=fP/Z, p/f=P/Z. Other terms are defined in 3D space, so they are defined like |P-P_gt| while 2D loss is defined on p, so the ratio between them is 1/f, given that on LMO, the depth is typically 1-2m

shangbuhuan13 commented 2 years ago

I guess this is my initial motivation to use 1/f

RuyiLian commented 2 years ago

Thanks for your reply! This is really helpful.