nerfstudio-project / nerfstudio

A collaboration friendly studio for NeRFs
https://docs.nerf.studio
Apache License 2.0
9.34k stars 1.27k forks source link

Issue: Rendered Image Not Visible in Viewer #2073

Closed msanchezvicom closed 1 year ago

msanchezvicom commented 1 year ago

Hello,

I am attempting to use a custom dataset to train the model. The dataset is constructed from a synthetic autonomous driving environment. The concept is to utilize the four attached vehicle cameras for rendering the 3D scene surrounding the vehicle. The images are organized in the following order: left_cam, front_cam, right_cam, back_cam, ensuring overlapping regions. Additionally, stepped frames are employed to avoid using consecutive frames while maintaining common content from different perspectives. In total, 32 images are utilized. The image below demonstrates an example for the first frame of each camera (left, front, right, back):

Screenshot from 2023-06-14 13-36-14

I am processing the data with the following command: ns-process-data images --camera-type fisheye --data data/nerfstudio/images/ --output-dir data/nerfstudio/output/

Everything seems to be functioning correctly: Screenshot from 2023-06-14 13-37-48

For training the method, I am using the following command: ns-train nerfacto --data data/nerfstudio/output/ --viewer.skip-openrelay True

The training procedure executes without issues. However, when I access the viewer, the rendered image is not visible and appears to be black: Screenshot from 2023-06-14 13-56-23

To conduct a test, I tried using only one camera POV (the front one), and some things were visible in the rendered scene but a lot of blur and artifacts appear:

image

I would greatly appreciate any suggestions on how to enhance my results or insights into any mistakes I might be making.

Thank you very much for your assistance!

tancik commented 1 year ago

The issues is the black borders in images. We don't have a super clean way of handling this at the moment. You will either need to create circular masks (though this will slow things down a bit), or crop your images so that the borders aren't visible prior to processing the data and training.

msanchezvicom commented 1 year ago

Thank you for your answer, @tancik. I have created the masks for the project using the following image: mask_carla_255

By considering only the front camera images, the results have improved significantly. Here's a screenshot as an example:

Screenshot from 2023-06-15 11-01-07

However, I believe the poses are being calculated incorrectly, as the cameras should be positioned one in front of the other, following a straight line. Currently, the cameras are located as shown in the following screenshot:

Screenshot from 2023-06-15 11-01-30

I have the intrinsics and extrinsics information (in the world coordinate system) for each camera, but in the OpenCV coordinate system, the axes are defined as follows: X (front), Y (left), Z (up). Cameras' pose e.g.:

image

I would highly appreciate any help or suggestions on how I can improve the camera pose calculation. Thank you very much for your assistance.

P.S.: Do you think I should continue using single-camera images, or should I switch to using all four cameras instead?

machenmusik commented 1 year ago

FYI, since you have a small number of images, https://github.com/nerfstudio-project/nerfstudio/issues/2075#issuecomment-1591890732 may help speed up training for your masked dataset.

machenmusik commented 1 year ago

You may be running into this https://github.com/nerfstudio-project/nerfstudio/issues/2055#issuecomment-1586435943

msanchezvicom commented 1 year ago

@machenmusik Thank you for your message. I'm not entirely sure if I understand you correctly. Are you suggesting to disable the camera optimization procedure? If so, could you please provide instructions on how to do that?

Additionally, I mentioned earlier that I was testing with a small set of 32 images. However, I also have a larger dataset that includes 1997 images per camera.

Thank you once again!

machenmusik commented 1 year ago

I believe it is --pipeline.datamanager.camera-optimizer.mode off from https://github.com/nerfstudio-project/nerfstudio/issues/1017#issuecomment-1347776179

msanchezvicom commented 1 year ago

@machenmusik , thank you for your message. Unfortunately, I am not getting better results. image

I am running the training command as follows: ns-train nerfacto --data data/nerfstudio/carla_front_few/output/ --viewer.skip-openrelay True --pipeline.datamanager.camera-optimizer.mode off

Just to recap: I have four fish-eye lens cameras attached to a vehicle, capturing a total of 1997 frames per camera. I am evaluating two small sets of images:

  1. 32 images considering the four points-of-view, ensuring overlapping regions between consecutive images.
  2. 32 images from the same camera (front-camera, the one I provided the example above).

Moreover, since I am generating this dataset, I know the positions of these cameras with respect to the world frame, as well as their intrinsics (but I am not using this information, I am using the one provided by COLMAP). I am also using the mask I provided in a previous comment for the training phase.

For processing the data, I use the following command:

ns-process-data images --data  <in_data> --output_dir <out_data>  --camera-type fisheye --matching-method exhaustive --num-downscales 1

Do you think I should manage my data differently? I would highly appreciate any help on this topic since achieving this 3D scene reconstruction is very important for my research.

Olimoyo commented 1 year ago

Hi @msanchezvicom, I am also struggling with training a NeRF using my own pose data. In my case, it works when I use the poses from COLMAP so I'm assuming I'm doing something wrong in the pose processing.

Does it work for you if you use the poses calculated from COLMAP?

msanchezvicom commented 1 year ago

Hi @Olimoyo , @machenmusik , @tancik

I'm happy to share that I have finally achieved proper results. In my case, the ns-process-data step was providing inaccurate camera poses. I believe this occurred because I am using four cameras that have overlapping areas between consecutive cameras, but they do not capture the same "object/scene." Consequently, COLMAP was unable to estimate accurate poses. As I mentioned before, I have access to the relative poses of the cameras in the vehicle and its odometry. Using this information, I infer the cameras' poses within the world coordinate system and adapt them to the transforms.json file format. I have attached a short video demonstrating the current state. Although there is room for improvement, I consider this a significant achievement.

demo_rec.webm

I sincerely appreciate all of your help and suggestions.

zhan-xu commented 1 year ago

@msanchezvicom Could I ask how did you incorporate the mask to get rid of black border? I didn't see that in your comments. Thanks!

msanchezvicom commented 1 year ago

Hey @zhan-xu,

No problem! In the transforms.json file, I've mentioned the mask path for each frame. Take a look at this image for reference: image For more detailed specifications, visit https://docs.nerf.studio/en/latest/quickstart/data_conventions.html#masks.

Let me know if you need any further assistance!

zhan-xu commented 1 year ago

@msanchezvicom thanks for the answer. As far as I understand, I need to write a script to load original transformations.json, put in mask_path for each frame and save the json file. Is that correct? I will try this first, thank you!

msanchezvicom commented 1 year ago

Hi @zhan-xu ,

Yes, exactly! I hope this might help:

import json

# Read the JSON file
with open('transforms.json', 'r') as json_file:
    data = json.load(json_file)

# Add "mask_path" field to each frame
for frame in data['frames']:
    frame['mask_path'] = 'mask/mask_carla.png' # in my case, same mask for each frame

# Save the modified JSON back to the file
with open('transforms.json', 'w') as json_file:
    json.dump(data, json_file, indent=4)

Best regards

JackieZhuzzz commented 1 year ago

Hi @machenmusik

I have the accurate poses of cameras in my dataset and I adapt them to the transforms.json file format. However, after I trained them for 30000 iterations, the resolution is still extremely low. How did you render the scene so clearly? Are there any other parameters we need to set for the fisheye images? Thanks!

zhan-xu commented 1 year ago

Hi @JackieZhuzzz i also have this issue. I checked the camera poses which seem good to me. But the results are not good even though I applied mask as @msanchezvicom suggested. Please let me know if you find out something. Thank you!