Positional encoding in `RadianceNet`

shrubb commented 3 years ago

Hi, and thanks a lot for the implementation!

https://github.com/ventusff/neurecon/blob/972e810ec252cfd16f630b1de6d2802d1b8de59a/configs/volsdf_nerfpp_blended.yaml#L41-L42

I was wondering why we are not using positional encoding and instead are feeding raw 3D coordinates and view directions here? Especially because IDR is not doing so and the defaults are 6 and 4... 🤔

I tried changing these from -1 to 6 and/or 4, and training collapses or at least goes much slower... To me, this seems extremely weird!

ventusff commented 3 years ago

Hi @shrubb, Yes, I agree that the embed_multire_view should be 4 in every case. Sorry that this is not carefully configured before, because in my practice embedding for radiance network seems to not have noticeable influence.

However, the raw 3D coordinates may still be directly feeded, as in IDR, in a way of respecting official choice of implementation. But of course feeding embeded input may lead to better results.

As for the training speed, in my test:

volsdf.yaml

only setting embed_multires_view=4:

  ....
  (radiance_net): RadianceNet(
    (embed_fn): Identity()
    (embed_fn_view): Embedder()
    (layers): ModuleList(
      (0): DenseLayer(
        in_features=289, out_features=256, bias=True
        (activation): ReLU(inplace=True)
      )
  ...
  0%|           | 97/100000 [00:25<6:06:50,  4.54it/s, loss_img=0.135, loss_total=0.137, lr=0.000499]

setting 6&4:

(radiance_net): RadianceNet(
(embed_fn): Embedder()
(embed_fn_view): Embedder()
(layers): ModuleList(
  (0): DenseLayer(
    in_features=325, out_features=256, bias=True
    (activation): ReLU(inplace=True)
  )
...
0%|             | 107/100000 [00:28<6:07:23,  4.53it/s, loss_img=0.16, loss_total=0.164, lr=0.000499]

volsdf_nerfpp_blended.yaml

You may notice that the training iterations arise from 100k to 200k, which is the major reason of training time increase.

original -1&-1 setting:

0%|           | 131/200000 [00:34<14:43:28,  3.77it/s, loss_img=0.182, loss_total=0.185, lr=0.0005]

only setting embed_multires_view=4:

0%|           | 209/200000 [00:52<13:54:47,  3.99it/s, loss_img=0.215, loss_total=0.22, lr=0.0005]

setting 6&4:

0%|            | 121/200000 [00:32<14:50:09,  3.74it/s, loss_img=0.162, loss_total=0.163, lr=0.0005]

ventusff commented 3 years ago

As for whether training collapses or not, I'm running training tests on BlendedMVS, to be continued...

shrubb commented 3 years ago

Thanks for sharing your experience and especially for linking the IDR code! 🙏 Now it makes more sense.

By "slower" I actually meant convergence speed and training stability. Like, when I apply positional encodings to radiance net's inputs (pink graph), losses/metrics/parameters go crazy, while with your default config training is smooth and stable (blue graph). I'll continue to investigate this.

ventusff commented 3 years ago

Hi @shrubb , After running some training tests, I think a possible explanation can be made, which is also something I found out earlier:

At early training stage, the dominating representing branch needs to be the geometry_feature, for the network to quickly find initial clues about a roughly correct shape to render correct images.

If the embeded 3D coordinates are feeded to radiance network instead of raw 3D coordinates (dim=39 instead of dim=3), then the representing capability is branched with ambiguity, the network may assign more representing capability to the radiance itself instead of learning a rough shape, thus leading to very slow shape convergence (and may not even converge at all)

That is to say, the dominating representing branch needs to be the first one among the following three at early stages:

x -> embedder -> SDF -> geometry_feature -> radiance
x -> (embedder) -> radiance
v -> (embedder) -> radiance

Embedding location and view direction input of radiance network introduce larger gradients and "prempt" more gradients flows to the radiance net, "sharing" relatively less gradients to the surface network, as shown below:

Pratical comparison of normals validation when training: (You can see that no meaningful shapes are learned in the latter two cases)

embed_x=-1 & embed_v=-1 @ 0k,1k,2k
embed_x=-1 & embed_v=4 @ 0k,1k,2k
embed_x=6 & embed_v=4 @ 0k,1k,2k

ventusff commented 3 years ago

But still, in VolSDF, embedding view_direction even leads to no convergence. This is still weird, since it's OK with NeuS.

The VolSDF paper mentioned nothing about whether the input of radiance network is embedded or not. I'm looking forward to their official implementation for code comparison.

shrubb commented 3 years ago

Makes perfect sense, thank you for the insight and the experiments! 🎉

ventusff commented 3 years ago

Glad to know it helps 😄

ventusff / neurecon

Positional encoding in `RadianceNet` #2

volsdf.yaml

volsdf_nerfpp_blended.yaml