RobustNeRF: Ignoring Distractors with Robust Losses

Sabour et al., CVPR 2023

Neural radiance fields (NeRF) excel at synthesizing new views given multi-view, calibrated images of a static scene. When scenes include distractors, which are not persistent during image capture (moving objects, lighting variations, shadows), artifacts appear as view-dependent effects or 'floaters'. To cope with distractors, we advocate a form of robust estimation for NeRF training, modeling distractors in training data as outliers of an optimization problem. Our method successfully removes outliers from a scene and improves upon our baselines, on synthetic and real-world scenes. Our technique is simple to incorporate in modern NeRF frameworks, with few hyper-parameters. It does not assume a priori knowledge of the types of distractors, and is instead focused on the optimization problem rather than pre-processing or modeling transient objects. More results on our page https://robustnerf.github.io/public.

🔑 Key idea:

Figure 2 said the occlusion from the other objects or the concavity of objects for some training rays induces ambiguity, which is prone to be encoded as view-dependent effects (having different photometric values depending on viewing direction).
In Sec. 3.1., the authors say: "The problem becomes more complex as both non-Lambertian reflectance phenomena and outliers can be explained as view-dependent radiance."
In my humble opinion, if training views are few, floating slates are rendered due to that ambiguity. (in a specific training view, floating slates are invisible as a hairline, but they appear in novel views.)
In Sec. 3.2., one of the key messages is high-frequency details are often coupled with those outliers, and this cannot be decoupled with the robust kernel in Eqn. 5.

💪 Strength:

Trying to define ambiguity, which potentially negatively impacts performance.

😵 Weakness:

Transient object is one of the contributors to ambiguity. How can we define the broad range of the source of ambiguity?
Limitations section mentioned that it is "often taking longer to train." How much? To get a weight for each ray, we need to evaluate 16x16 neighbors in Eqn. 10. It hinders efficiently training random rays.
Increasingly ad-hoc method from trimmed estimators to using 3x3 box kernel, and to 16x16 neighborhoods checking.

🤔 Confidence:

Medium

✏️ Memo:

I cannot follow $\kappa(\epsilon)=|\epsilon|$ in Sec 2.2. In Eqn. 2, the input of $\kappa$ is already positive due to the MSE loss.
How do you define the "ground-truth distribution of residuals"? I assume the residual is the difference between the ground-truth RGB pixels and rendered RGB pixels. Then, what does it make sense with "ground-truth distribution of residuals"? Just visualizes the histogram of that residuals? Or, is there any specific intention with "ground-truth distribution," which is different from the "generated distribution of residuals"?

jnhwkim / Pensees

RobustNeRF: Ignoring Distractors with Robust Losses #19

RobustNeRF: Ignoring Distractors with Robust Losses

🔑 Key idea:

💪 Strength:

😵 Weakness:

🤔 Confidence:

✏️ Memo: