Closed jiahaoLjh closed 3 years ago
Same confusion.
I changed the folder name "data/test/" to "/data/train/" and tried to implement the training code. It cannot be implemented successfully. Could you (the authors) please have a check on whether there is something wrong?
Many thanks!
Hello, we will look into the code and back to you asap.
Hi,
Is there an answer to the "differentiability" problem? Or do you actually mean that the particular voxel chosen in the geo-depth optimization stage is differentiable w.r.t. satellite depth? In that case, it is certainly differentiable, but it follows that only a small part of estimated sate-depth as well as its previous stages have gradient in one forward path, since all height values except the chosen one are discarded.
Oh sorry, we forget to reply to this thread. Yes, what you say is exactly what mentioned in the paper.
Hi,
May I confirm that the whole pipeline is in fact not end-to-end trainable? It is not end-to-end differentiable because there are discretized operations in computing the occupancy grid. The converted depths from the satellite image height map have decimals. When computing the occupancy grid (particular voxel chosen, as said by TomQuartz), the decimals will be rounded to integers. It is the operation that causes the whole pipeline non-differentiable. Is this correct?
Is there an answer to the "differentiability" problem? Or do you actually mean that the particular voxel chosen in the geo-depth optimization stage is differentiable w.r.t. satellite depth?
If I was not wrong, there is only supervision on satellite image depth and no supervision on the street-view panorama depth? If this is the case, the geo-depth optimization stage doesn't need the geo-transformation (as it only estimates satellite height maps). Thus, it does not matter whether the geo-transformation stage is differentiable or not.
Thank you very much. Your explanation is really appreciated.
Hi,
It's a very nice and inspiring work!
As you mention in the paper, the geo-transformation procedure is differentiable. I'm not clear what do you mean by "differentiable" here. Do you mean the loss from the later stage (street-view stage) can be back-propagated to the satellite stage to better learn the satellite semantics and depth? It's not intuitive for me why this transformation is differentiable, since there is kind of discretization when you generate the voxel-based occupancy grid and search for the first encountering voxel.
Thanks.