vislearn / ngdsac_horizon

Neural-Guided, Differentiable RANSAC for Horizon Line Estimation
BSD 3-Clause "New" or "Revised" License
33 stars 4 forks source link

Predicting distribution over pixels #1

Closed djr2015 closed 4 years ago

djr2015 commented 4 years ago

Thank you for such a pedagogical introduction to your work!

I would love to know your thoughts on whether it would be practical to learn a sampling distribution over image pixels directly (i.e. using NG-RANSAC on it own) rather than over the intermediate sampling locations whose positions you predict via DSAC (and then sample over using NG-RANSAC)?

As this would mean far more potential sampling locations I imagine that the CNN weights would need to be well-initialized (by training on some intermediary objective) but that this would result in better potential precision of the outputted line given its endpoints are sampled at pixel-level precision.

ebrach commented 4 years ago

Hi Dominic,

technically, predicting a distribution over pixels is certainly possible. We do that for the camera re-localization experiments of the paper (code will be published soon). I guess, depending on the image resolution, you could run into numerical problems at some point (when normalizing over many pixels) but I imagine there are ways around that.

Whether it would help with precision, I'm not sure. You suggest that having pixel-level precision would be an improvement, but I would argue quite to the contrary. The offset vectors predicted with NG-DSAC support sub-pixel precision. On the other hand, you would have a weighted average over many pixels, so pixel-level precision is probably not a limiting factor so much.

It think the difference in both approaches will boil down to the dynamics of the underlying optimization problem (i.e. what is easier to learn for a neural network), and that is hard to predict. If you give your idea a go, let me know. I would be interested to see what happens :-)

Best, Eric

djr2015 commented 4 years ago

Hi Dr. Brachmann,

Thank you for your insight! Based on initial experiments I do indeed seem to be able to get sufficiently high precision by fitting lines to pixels during the forward pass.

I am also having issues during training and have the following question concerning the implementation of NG-RANSAC in order to help debug my code: when you use NG-RANSAC in the context of epipolar geometry ( https://github.com/vislearn/ngransac/blob/master/ngransac_train_e2e.py ) I noticed you do not normalize the log_probs_grad tensor by the batch size. However in this repo you normalize the equivalent tensor g_log_probs by the batch size here and by an additional factor 10 here - I was wondering why this is?

ebrach commented 4 years ago

Hi Dominic,

normalizing the gradients by the batch size or multiplying the gradients with a constant factor should be equivalent to multiplying the learning rate with the corresponding factor. So if in the code I re-scale the neural guidance gradients I essentially use separate learning rates for the two output heads. I'm not completely sure whether the scaling by factor 10 really made a difference. I left it in, because this is how the numbers in the paper were produced.

In summary, if you have issues during training, try a smaller learning rate for the neural guidance output. Neural guidance gradients tend to be rather large but on the other hand, the ADAM implementation of PyTorch seems to be rather robust.

Best, Eric