different result of standing and sitting person

zju3dv / ENeRF

SIGGRAPH Asia 2022: Code for "Efficient Neural Radiance Fields for Interactive Free-viewpoint Video"

https://zju3dv.github.io/enerf

Other

418 stars 28 forks source link

different result of standing and sitting person #19

Closed chky1997 closed 1 year ago

chky1997 commented 1 year ago

Hi, I built two datasets, one for standing peroson and one for sitting person. The training result of standing data is much better than sitting data (about 4 PSNR). I noticed that someone said openpose achieve better result with standing person than sitting person. https://github.com/zju3dv/EasyMocap/issues/94 Is that the problem cause enerf achieve different results? Thank you!

haotongl commented 1 year ago

ENeRF utilizes SMPL vertices to compute near/far bounds. ENeRF will produce poor rendering results if wrong near/far bounds are provided. You can project the bounding box to 2D, or visualize on 3D to see if near/far is correct.

chky1997 commented 1 year ago

Thank you for your reply. I have another question about dataset. In zjumocap dataset, 21 cameras cover the 360° view of the person. Is there any material about how to decide the locations and directions of cameras? Especially when the number of cameras is limited.

haotongl commented 1 year ago

Hi, do you mean "calibrate" by using "decide"? Here is a great document that may help you calibrate your cameras: https://chingswy.github.io/easymocap-public-doc/quickstart/calibration.html

chky1997 commented 1 year ago

Sorry, I mean the arrangement of the 21 cameras, how to place the cameras to get better, or more proper data for enerf training. For zjumocap, 21 cameras are used to cover 360° basically at the same height. I give a random example here, if the 21 cameras are rearranged at 3 levels of height (high, medium, low), then 7 cameras are used to cover 360° each height level. The larger space of two nearest cameras will reduce the repeatability of the images for enerf training, but more vertical information will be achieved. Will the result become better or worse? In other words, why 21 cameras? Is 10 or 15 cameras not enough? Is there any material or your experiments that discussed this kind of problem? Thank you!

haotongl commented 1 year ago

In practice, I found that high repeatability is very for ENeRF Training. I have not performed precise experiments on this point. Some related experiments will hint at this: for the ZJU-MoCap dataset, with 11 camras as input, there will be a certain visual gap with 21 cameras and 11 cameras used for training.

If you have a limited number of cameras, I suggest placing them along a horizontal circular arrangement, similar to the ENeRF_Outdoor setup. https://github.com/zju3dv/ENeRF/blob/master/docs/enerf_outdoor.md

chky1997 commented 1 year ago

Thank you for your explanation!