naver / mast3r

Grounding Image Matching in 3D with MASt3R
Other
796 stars 45 forks source link

Incorporating Known Camera Poses in Global Alignment Optimization #26

Open resurgo97 opened 1 month ago

resurgo97 commented 1 month ago

I'm seeking a way to run global alignment optimization with known camera poses. After examining the code, I've identified some challenges:

  1. cam2w matrices are constructed in make_K_cam_depth, but they don't directly correspond to standard extrinsic matrices.

  2. The code handles scale ambiguity by adjusting the translation vector based on focal length and scale, rather than multiplying scales to the pointmaps.

  3. This approach leverages the fact that scaling up the pointmap is equivalent to: a) increasing the focal length b) shifting the camera location along the z-axis

  4. As a result, when mapping 2D points into 3D space, per-frame scales don't need to be multiplied to depthmaps:

    pts3d = proj3d(invK[img], pixels, depthmaps[img][idxs] * offsets)

While this implementation is convenient, it makes incorporating known camera poses challenging because cam2w is not a pure extrinsic matrix and is entangled with the focal length (an intrinsic parameter).

Questions

  1. Are there plans to enable the incorporation of known camera poses?
  2. Is there consideration for changing the code design to facilitate easier integration of known camera poses?

I welcome any opinions or insights on this topic from other people.

Thank you!

resurgo97 commented 1 month ago

For the record, I tried to incorporate known camera poses before shifting the camera along z-axis (in order to disentangle scale & focal length from extrinsic matrix), but it failed.

Reed-yang commented 1 month ago

Hi, are you having any progress in developing preset known poses with sparse_global_alignment? I'm working on the same issue...

resurgo97 commented 1 month ago

Hi, are you having any progress in developing preset known poses with sparse_global_alignment? I'm working on the same issue...

Unfortunately not ;( I think the current implementation is not fundamentally appropriate for known camera pose scenarios. If you're running it only for inference, fastest way seems to be starting from the DUSt3R code (which is entirely different from MASt3R code) and modify it to add additional heads that were added in MASt3R.

Reed-yang commented 1 month ago

Have you tried modify logic of function make_K_cam_depth then run sparse_alignment to optimize depthmaps?

ljjTYJR commented 1 month ago

I think it is easy to pass some known camera poses to setting. During the optimization, the fixed poses are not passed to the optimizer, that would be OK.