Closed kwea123 closed 3 years ago
Thanks! Most of the videos we showed come from Adobe Stock (but I am no longer at Adobe), so I don't think I can release all the data due to license issue.
In terms of masks, in summary, I believe mask quality will not have a strong influence. They are used only for hard-mining data-driven priors, not used for directly telling the network which region is non-rigid or not. For the videos without thin moving objects such as limbs or hands further away from the camera, turning off the coarse mask initialization can still give us good results (even if there is moving shadow ).
Thanks. I will try training on running kid to see how it performs without mask.
Do you still have the training log (tensorboard summary) of the pretrained model (kid-running_ndc_5f_sv_of_sm_unify3)? I would like to compare with my mask-free version.
In order to see how it decomposes the fg/bg, using your pretrained model, I manually set the raw_blend_w
in this line
https://github.com/zhengqili/Neural-Scene-Flow-Fields/blob/7d8a336919b2f0b0dfe458dfd35bee1ffa04bac0/nsff_exp/render_utils.py#L1020
to either 0 or 1 (raw_blend_w*0
or raw_blend_w*0+1
). If I understand the code correctly, 0 means render only the bg and 1 means only the fg.
The command I use to render is
python run_nerf.py --config configs/config_kid-running.txt --render_bt --target_idx 0
The bg (raw_blend_w*0
)
The fg (raw_blend_w*0+1
)
And the composed (original code with raw_blend_w
):
In terms of image quality bg/fg doesn't matter so I believe the mask won't have strong influence as you said, but I'm focusing on the model's capability of separating static regions from dynamic ones, which is also a claim in the paper (Fig. 5) and in the video. However, my result doesn't seem to correctly separate bg and fg, it just outputs almost everything as fg. Did I misunderstand anything? If so, how to generate bg and fg only images correctly?
I believe for foreground, you need to use blend_alpha to mask out possible static region from dynamic model: i.e. you need to rendering fg through alpha with blending weight using this line
alpha_dy = (1. - torch.exp(-opacity_dy dists) ) raw_blend_w
FYI, I believe more principle way is to remove blending weight for training and rendering, but we always found this strategy cause much more artifacts.
I believe for foreground, you need to use blend_alpha to mask out possible static region from dynamic model: i.e. you need to rendering fg through alpha with blending weight using this line
alpha_dy = (1. - torch.exp(-opacity_dy dists) ) raw_blend_w
I'm not sure what you mean, that line is in raw2outputs_blending
function, and if I set raw_blend_w=1
as input, it sets alpha_dy
to what it should be and sets alpha_rig
to zero. What else am I supposed to do?
Hi, thanks for the code! Do you plan to publish the full data (running kid, and other data you used in the paper other than the NVIDIA ones) as well?
In fact, the thing I'd like to check the most is your motion masks' accuracy. I'd like know if it's really possible to let the network learn to separate the background and the foreground by only providing the "coarse mask" that you mentioned in the supplementary.
For example for the bubble scene on the project page, how accurate should the mask be to clearly separate the bubbles from the background like you showed? Have you also experimented on the influence of the mask quality, i.e. if masks are more coarse (larger), then how well can the model separate bg/fg?