ogroth / tf-gqn

Tensorflow implementation of Neural Scene Representation and Rendering
Apache License 2.0
188 stars 35 forks source link

Here is why the loss stays in local minimum ... #37

Open PengchengWang opened 5 years ago

PengchengWang commented 5 years ago

I've trained the model on rooms_ring_camera twice, everytime had a loss stoping around 7000. So I ran view_interpolation.ipynb and get this stuff of things.. 微信图片_20190813212227 微信图片_20190813212218

wenbingz commented 5 years ago

I wish the author could upload trained model to convince us...

ogroth commented 5 years ago

Hi @PengchengWang , your observation is correct. I've experienced similar behavior when I trained on rooms ring. The model learns to render the quadratic rooms (i.e. floor, background and sky) relatively well from the correct angle. However, the rendered rooms are mostly empty until very late in the training process (>1.5 iterations) and even then the rendered objects are just blurry glimpses. My hunch is, that a careful learning rate schedule needs to be implemented in order to get the object details right. It is also very plausible that the network learns to render empty rooms first since these parts constitute the lion's share of the images and therefore dominate the L2 reconstruction term. However, the loss plateauing around 6.9k is expected. At some point the reconstruction term can at best learn the perfect means of the target images and the only thing which is optimized afterwards is the KL divergence (which should still continue to decrease). @wenbingz The only dataset in which I managed to do the view interpolation magic reliably are the Shepard Metzler datasets. I'm trying to upload a good snapshot together with a run script.

ogroth commented 5 years ago

@PengchengWang @wenbingz I've uploaded model snapshots for the Shepard Metzler datasets, together with tensorboard summaries, run commands and hyper-parameters. Please feel free to check them out here.