zju3dv / ENeRF

SIGGRAPH Asia 2022: Code for "Efficient Neural Radiance Fields for Interactive Free-viewpoint Video"
https://zju3dv.github.io/enerf
Other
413 stars 28 forks source link

Results on ENeRF-Outdoor dataset and poor quality depth #20

Open ricshaw opened 1 year ago

ricshaw commented 1 year ago

Hi, thanks for the great work! However, after running your training script (python train_net.py --cfg_file configs/enerf/enerf_outdoor/actor1.yaml) on Actor1 for 50 epochs, I am getting the following results. The results for the color prediction are not as good as advertised on your project page, with lots of warping of the background. Also, the depth maps are quite poor, with the depth of the shadow region being incorrectly predicted. Do you know why this might be?

actor1_0800_0_800

Color: https://user-images.githubusercontent.com/9107279/219429442-24e2cc1d-bb5b-4d78-9f58-588e318fdbaa.mp4

Depth: https://user-images.githubusercontent.com/9107279/219429583-eccd8139-173f-4a6c-b4c6-26d0e83e5db9.mp4

haotongl commented 1 year ago

Hi, thanks for your attention. For the color prediction of the background, I think setting input_views_num to 4 will give better results. (The video in project page is rendered with input_views_num = 4) https://github.com/zju3dv/ENeRF/blob/master/configs/enerf/enerf_outdoor/actor1_path.yaml#L5

For depth prediction for shadows, try setting background rendering mode to foreground images blending. This produces reasonable depth prediction results. However, it will introduce some ghost rendering artifacts near the people.
src_inps=batch['src_inps'] https://github.com/zju3dv/ENeRF/blob/master/lib/networks/enerf/network_composite.py#L139

chky1997 commented 1 year ago

Following the instructions, I also found the poor quality of result video of outdoor dataset, after setting input_views_num to 4. The PSNR after 50 epochs is about 27, which is far lower than zjumocap dataset. The modified parameters are as below: 1678241216140 1678241243011

haotongl commented 1 year ago

The ZJU-Mocap dataset got a high PSNR because it is a simple dataset. ENeRF-Outdoor is more challenging, PSNR of 27 is not low for this dataset. If you produce similar rendering results to the project page, it should be a good indication that your usage are correct.

chky1997 commented 1 year ago

Thank you for your explanation! However, my rendering result is far poorer than project page, the quality is basically the same as @ricshaw provided in this issue. I compared the properties of the saved videos, the project page video is 48MB and the video I saved from run.py is only 8MB. Does the video quality difference caused by the video saving process?

https://user-images.githubusercontent.com/62194406/223611717-692d0d1f-2550-48e6-b65e-7c92a137a296.mp4

haotongl commented 1 year ago

Thanks. I don't think the saving process will affect the quality a lot. I think the quality of this rendering video is approaching the quality of the video in the project page. The main artifacts seem to come from the edges of the picture, which may because there are unseen regions in these views.

There are some differences between the released code and the code for generating rendered video on the project page. The released code uses the bbox generated by visual hull, while the bbox from the rough esimate of the 3D key points of the human body was used before. I will try to release the model and the corresponding bbox to help you fully reproduce the rendering video on the project page.

chky1997 commented 1 year ago

Thank you so much for your help!