lizuoyue / sate_to_ground

Official Code of the CVPR 2020 Paper: Geometry-Aware Satellite-to-Ground Image Synthesis for Urban Areas
41 stars 8 forks source link

Differentiability of the Geo-transformation stage #1

Closed jiahaoLjh closed 3 years ago

jiahaoLjh commented 4 years ago

Hi,

It's a very nice and inspiring work!

As you mention in the paper, the geo-transformation procedure is differentiable. I'm not clear what do you mean by "differentiable" here. Do you mean the loss from the later stage (street-view stage) can be back-propagated to the satellite stage to better learn the satellite semantics and depth? It's not intuitive for me why this transformation is differentiable, since there is kind of discretization when you generate the voxel-based occupancy grid and search for the first encountering voxel.

Thanks.

YujiaoShi commented 4 years ago

Same confusion.

I changed the folder name "data/test/" to "/data/train/" and tried to implement the training code. It cannot be implemented successfully. Could you (the authors) please have a check on whether there is something wrong?

Many thanks!

lizuoyue commented 4 years ago

Hello, we will look into the code and back to you asap.

TomQuartz commented 3 years ago

Hi,

Is there an answer to the "differentiability" problem? Or do you actually mean that the particular voxel chosen in the geo-depth optimization stage is differentiable w.r.t. satellite depth? In that case, it is certainly differentiable, but it follows that only a small part of estimated sate-depth as well as its previous stages have gradient in one forward path, since all height values except the chosen one are discarded.

lizuoyue commented 3 years ago

Oh sorry, we forget to reply to this thread. Yes, what you say is exactly what mentioned in the paper.

YujiaoShi commented 3 years ago

Hi,

May I confirm that the whole pipeline is in fact not end-to-end trainable? It is not end-to-end differentiable because there are discretized operations in computing the occupancy grid. The converted depths from the satellite image height map have decimals. When computing the occupancy grid (particular voxel chosen, as said by TomQuartz), the decimals will be rounded to integers. It is the operation that causes the whole pipeline non-differentiable. Is this correct?

Is there an answer to the "differentiability" problem? Or do you actually mean that the particular voxel chosen in the geo-depth optimization stage is differentiable w.r.t. satellite depth?

If I was not wrong, there is only supervision on satellite image depth and no supervision on the street-view panorama depth? If this is the case, the geo-depth optimization stage doesn't need the geo-transformation (as it only estimates satellite height maps). Thus, it does not matter whether the geo-transformation stage is differentiable or not.

Thank you very much. Your explanation is really appreciated.