nerfstudio-project / nerfstudio

A collaboration friendly studio for NeRFs
https://docs.nerf.studio
Apache License 2.0
9.38k stars 1.27k forks source link

Camera optimizer didn't work #1017

Open Xiaxia1997 opened 1 year ago

Xiaxia1997 commented 1 year ago

@benji-york @corbt @erich666 @akanazawa @decrispell

Describe the bug I've tried the pose refinement for storefront dataset and got the result as below. Noise mentioned is --pipeline.datamanager.camera-optimizer.position-noise-std 0.15. From the chart, I'm confused about the pose refinement can not recover psnr from 21 to 28. Moreover, the pose refinement works nothing when noise added.

condition psnr
without pose refinement 28.3
pose refinement 28.7
add noise only 21.3
add noise + pose refinement 21.2
  1. So my first question is about the pose refinemen, did I use it appropriately? Here is my command:
# add noise, camera-optimizer
ns-train nerfacto --data=/data1/nerfstudio/data/storefront/   --vis tensorboard --viewer.websocket-port=5082 --pipeline.model.proposal-net-args-list.0.hidden-dim 64 --pipeline.model.proposal-net-args-list.0.log2-hashmap-size 22 --pipeline.model.proposal-net-args-list.0.num-levels 16 --pipeline.model.proposal-net-args-list.0.max-res 2048 --pipeline.datamanager.camera-optimizer.mode SO3xR3 --pipeline.datamanager.camera-optimizer.position-noise-std 0.15 nerfstudio-data 
# add noise, no camera-optimizer
ns-train nerfacto --data=/data1/nerfstudio/data/storefront/   --vis tensorboard --viewer.websocket-port=5082 --pipeline.model.proposal-net-args-list.0.hidden-dim 64 --pipeline.model.proposal-net-args-list.0.log2-hashmap-size 22 --pipeline.model.proposal-net-args-list.0.num-levels 16 --pipeline.model.proposal-net-args-list.0.max-res 2048 --pipeline.datamanager.camera-optimizer.position-noise-std 0.15  --pipeline.datamanager.camera-optimizer.mode off  nerfstudio-data 
  1. Lack of pose alignment:

    The BARF works well on pose refinement.There are two strategies in BARF: one is coarse to fine encoding, the other is pose alignement. Have you ever considered to add these to NeRFStudio? Especially the pose alignment, which indicates that the NeRF trained by refined pose may have different scale, rotation, translation. So BARF will compute the similarity(S) between GT pose and refined noised pose.

    During the training process, the refined pose will be tranformed to GT pose coordinate, aligned_refined_pose = refined_pose * S and compute the error between aligned refined pose and GT pose.

    During the rendering, the input pose which is regarded as GT pose will be tranformed to the refined noised pose coordinate, aligned_GT_pose = GT_pose * S.inverse, so we can use the aligned_GT_pose to render the image.

    I think the pose alignment is important for pose refinement, but it is lacking in nerfstudio. Will you consider to add it in the next?

akristoffersen commented 1 year ago

Hello! Sorry for the late reply-- for 1., these are a bit concerning of results, so I will look to see if we perhaps have a bug in how we handle the additional noise to the positions and rotations. The way you are using it is how we expect it to be used.

Just to clarify, are those PSNRs from validation or training views. If it's validation, there is no guarantee that the validation views will be accurate, as all the training poses could have shifted together during optimization. If this also shows up in training view PSNRs, perhaps 0.15 std is too much for our naive camera optimization to handle. I know that was the amount used in BARF's experiments, but like you said without the coarse to fine encoding that BARF uses, I could imagine its capacity to learn larger amounts of pose noise could be limited.

For 2., I think this certainly something we would want to add! I can look a bit deeper this week to see what an implementation would look like. Thanks for bringing this up!

Madaoer commented 1 year ago

@Xiaxia1997 May I ask you how do you get psnr results with pose refinement using nerfstudio? Because I find that the images generated by nerfstudio don't align with ground truth

vincepapaix commented 1 year ago

Jumping on this topic, from this ticket:https://github.com/nerfstudio-project/nerfstudio/issues/1101 thanks to @tancik we now have the dataparser_transforms.json file saved, which help to go from the transforms.json scale to the trained worldscale. https://github.com/nerfstudio-project/nerfstudio/pull/1105

I did try the flag --pipeline.datamanager.camera-optimizer.mode off

but I'm still seeing an overall shift of the nerf output compared to the ground truth. Is there additional ‘random noise’ added at training time? I’m training the same scene several times with the same settings and same camera_path.json for rendering but every single training results in a slight different shift for all objects. What is causing that?

Like @Madaoer I'm after rendering the exact same nerf output as my ground truth for quality loss evaluation. This can be very informative and helpful for compositing application to then enhancing a different camera move generates by nerf.

any ideas, let me know thanks

vincepapaix commented 1 year ago

Just confirming here that using --pipeline.datamanager.camera-optimizer.mode off and the same camera_path.json I do get the same position

I'm working on a way to combine the transform.json and dataparser_transforms.json to create a new camera_path.json to render an output that will be a 1:1 match with the ground truth dataset. Still no luck though I'm seeing some slight distortion in the world in the output VS the ground truth

Madaoer commented 1 year ago

@vincepapaix well, maybe I solve this problem. If you turn camera optimizer off, it should be aligned between output and ground truth, if you turn it on, we should use transform matrix optimized by camera optimizer to get the right pose.

idouros commented 1 year ago

@Xiaxia1997 @Madaoer @vincepapaix has anyone actually managed to do this with any degree of success? I am also facing a similar problem, I am trying to render a set of outputs that matches my input images. From the training and exporting I get the following:

  1. A dataparser transform, say M_dp, saved in dataparser_transforms.json
  2. A set of transforms M[i], saved in transforms.json
  3. Another set of transforms C[i], saved in json files generated when I run ns-export cameras
  4. The camera intrinsics that I can use to build the final projection matrix M_int to the image frame of reference.

...and I run the training with --pipeline.datamanager.camera-optimizer.mode off

Now, what to do with #1 and #4 above is pretty straightforward, but #2 and #3 are a bit of a puzzle. Not sure which one of the two I should use and how (maybe a combination of the two), to get what is effectively the camera extrinsics, but I have tried all sensible attempts without any success. I can see from my renders that I get the camera origin right, and that the scaling is correct, but the rotation seems to evade me - it appears to be consistently wrong in the same way.

Any hints on how to align the output to the input correctly would be greatly appreciated. Thanks in advance!

vincepapaix commented 1 year ago

Hi Idouros

Maybe this workflow might help https://docs.nerf.studio/en/latest/quickstart/blender_addon.html

You approach is correct, you need to combine the matrix and scale from the transforms.json with the dataparser_transform.json, this will the correct position for the render.json you need to use. And yes with camera optimizer OFF

If the scale is correct but the rotation is incorrect. It might just be a math problem when you combine the matrix?

idouros commented 1 year ago

Hi @vincepapaix,

Thank you for your help. After repeated experimentation I got it right eventually and yes, it was just a math problem.

Also, since the latest versions of nerfstudio and the introduction of the ns-export cameras feature, it is also possible to combine in the same way the matrices from the exported cameras json's (transforms_train.json and transforms_eval.json) with the dataparser_transform.json), and this will work with camera optimizer on (leave the default value).

MaxChanger commented 1 year ago

Hi @idouros, I am also using ns-export camera recently. I'm a little confused. Is the exported camera pose in transforms_train/eval.json file the optimized pose after training, or is it the same as the original input pose transforms.json?

After I use the transform and scale in dataparser_transform.json to calculate the raw camera pose, it is basically the same as the pose in the original input. There is basically no difference between quaternions, and the difference in translation is in the ten-thousandth place. And I made sure that camera optimization was turned on during training. Did you have any tips, thanks.

vrahnos3 commented 1 year ago

Hello @idouros, and everyone. I have some questions, about poses.

  1. Can you explain the correlation between transforms.json and camera_path.json (from command: ns-export cameras)?
  2. Also I want to know how to export the file dataparser_transform.json and which is it's usage too.
  3. Is any file between transform.json and camera_path.json containing the poses that are visualizing in viewer while training?

Thank you!