nerfstudio-project / nerfstudio

A collaboration friendly studio for NeRFs
https://docs.nerf.studio
Apache License 2.0
9.62k stars 1.32k forks source link

Transform.json and camera_path.json different world position? #1101

Closed vincepapaix closed 1 year ago

vincepapaix commented 1 year ago

Hi looking at both Transforms.json and camera_path.json the world matrix appears different.

for analyzing the loss of quality of the nerf output, it will be good to render the exact same image as the training frames.

I tried creating a camera_path using the training views in the UI, Even adjusting the FOV i don't get a match. The world matrix values are very different between the transforms.json and the camera_path.json created in the UI Do we know the correlation to go from transforms.json to camera_path.json (for the unit, fov etc)

Comparing to instant-ngp from nvidia code, they have a flag called : --screenshot_transforms and providing the transforms.json file I was able to render the exact same image as my dataset, which can be very valuable to see the difference.

Is there a way to render using the transforms.json?

thanks for your help

tancik commented 1 year ago

Yea, this workflow isn't ideal. The issue is that we automatically scale and orient the scene prior to training in order to get better results. Would should probably save this transformation somewhere so that it can be applied to custom paths. You can disable this transform by adding the following to the end of your training command nerfstudio-data --orientation-method none --center-poses False --auto-scale-poses False. This will expect the poses to roughly be in a +-1 bounding box. One other issue you may run into is that we are doing pose optimization during training to account for noise in poses. This will result in a shift. You can disable this by adding --pipeline.datamanager.camera-optimizer.mode off to the command, but note that the quality my decrease (even through the PSNR increases). A fancier solution would be to optimize the pose of your test image to match the scene, but this is a bit more complex.

vincepapaix commented 1 year ago

Thanks Tancik for your reply, really appreciated.

Saving the transformation will be valuable, so we can apply that to our transforms.json thinking about other requests (like camera fbx import/export). keeping Nerf in the same 'scale' or orientation like other 3d packages (like blender) can help for 3d animation and vfx work.

I'll give a try to those flags and see how it goes

tancik commented 1 year ago

There is a discussion regarding FBX in #1016. TLDR, not sure how to add without making the install more complex, looking for help/suggestions.

vincepapaix commented 1 year ago

I ran into an error running nerfstudio-data --orientation-method none --center-poses False --auto-scale-poses False

This was my full command ns-train nerfacto --data data/mydata nerfstudio-data --orientation-method none --center-poses False --auto-scale-poses False

did I do something wrong?

and if there is a way to save the world transformation happening at training time that can probably help me the most to preserve the best output

let me know

tancik commented 1 year ago

What was the error? Yea, we will put saving the world transformation on our list of todos.

vincepapaix commented 1 year ago

Thanks keep me posted about the world transformation.

My error might be related to something I'm doing on my side, I ll report back here if I can't figure it out.

wuzirui commented 1 year ago

It seems that with the help of the ns-export cameras command, this problem can be solved. The rough procedure to render all images from the training set and the eval test is as follows:

  1. export all camera transformations
    ns-export cameras --load-config /path/to/config.yml --output-dir poses
  2. convert exported cameras to the camera_path.json format: reference script:
    
    import json
    import numpy as np
    from pathlib import Path

def ind(c2w): if len(c2w) == 3: c2w += [[0, 0, 0, 1]] return c2w

train_transforms = json.loads(open('poses/transforms_train.json').read()) eval_transforms = json.loads(open('poses/transforms_eval.json').read()) transforms = train_transforms + eval_transforms transforms = sorted(transforms, key=lambda x: int(Path(x['file_path']).stem))

out = { 'camera_type': 'perspective', 'render_height': 1080, 'render_width': 1920, 'seconds': len(transforms), 'camera_path': [ {'camera_to_world': ind(pose['transform']), 'fov': 50, 'aspect': 1, 'file_path': pose['file_path']} for pose in transforms ] }

outstr = json.dumps(out, indent=4) with open('camera_path.json', mode='w') as f: f.write(outstr)

3. render images (and depth images in my case)
```bash
ns-render --load-config /path/to/config.yml --rendered-output-names rgb depth --traj filename --camera-path-filename camera_path.json --output-format images --output-path renders
elenacliu commented 1 year ago

@wuzirui wu hi, thank you for your scripts, but I'm wondering why the 'fov' and 'aspect' for the camera_path.json file will be the same (50 and 1 in your script) for all different situations. (Besides, I do find that in this repo, e.g. this code below:

https://github.com/nerfstudio-project/nerfstudio/blob/bc8341378a8001b2b116d3461c56453c13b1abac/nerfstudio/scripts/blender/nerfstudio_blender.py#L145-L175

just generates camera path with the fixed setting. ) Maybe for each custom dataset, the camera intrinsics will be different. If the user wants to generate pictures just to compare with the ground truth depth or rgb image, he should keep the camera intrinsics fit with his original dataset.

wuzirui commented 1 year ago

@wuzirui wu hi, thank you for your scripts, but I'm wondering why the 'fov' and 'aspect' for the camera_path.json file will be the same (50 and 1 in your script) for all different situations. (Besides, I do find that in this repo, e.g. this code below:

https://github.com/nerfstudio-project/nerfstudio/blob/bc8341378a8001b2b116d3461c56453c13b1abac/nerfstudio/scripts/blender/nerfstudio_blender.py#L145-L175

just generates camera path with the fixed setting. ) Maybe for each custom dataset, the camera intrinsics will be different. If the user wants to generate pictures just to compare with the ground truth depth or rgb image, he should keep the camera intrinsics fit with his original dataset.

The FOV and aspect are the same across my data, this script is just an example :)

elenacliu commented 1 year ago

@wuzirui thx for your instant reply! I see~

yurkovak commented 1 year ago

For anyone coming back here, note that since https://github.com/nerfstudio-project/nerfstudio/pull/2459 there is

ns-render dataset --load-config /path/to/config.yml --output-path /save/folder --split train

It's more correct to use it compared to the script shared above because ns-render camera-path assumes optical center coordinates at the image center. This is not always true and may result in shifted images compared to the dataset.