Open PMRS-lab opened 6 days ago
Thank you for your interest in this project. If you check the training code, you can find the next_pair_batch function is being called. https://github.com/tudelft-iv/CrossViewMetricLocalization/blob/cc76e78a0f7d4396617565a7d976ee9caf70f9d8/train_Oxford.py#L132
The cropping is done at https://github.com/tudelft-iv/CrossViewMetricLocalization/blob/cc76e78a0f7d4396617565a7d976ee9caf70f9d8/readdata_Oxford.py#L227
Thank you for your interest in this project. If you check the training code, you can find the next_pair_batch function is being called.
The cropping is done at
Thank you very much for your answer. Another question is whether the network does not reduce the perspective difference between ground images and satellite images when extracting features. From the network structure you showed, it seems to directly learn ground images without performing perspective changes. Will this affect the similarity calculation results? After all, if the perspective is not changed and feature extraction is performed directly, the difference between the two images is quite large.
Thank you for your interest in this project. If you check the training code, you can find the next_pair_batch function is being called.
The cropping is done at
Can it be considered that the satellite images were cropped during the training process, without first outputting the cropped images as a satellite image block dataset?
Regarding the dataloader, the cropping is done on the fly since the random offsets are generated at each iteration.
Regarding the model architecture, we did not use common perspective changes, e.g., polar transformation and homography because 1. polar transformation assumes center alignment between ground and aerial view, an assumption that does not hold for fine-grained localization. 2. homography ignores above ground objects, limiting the model solely to lane markings, etc.
In general, we find the use of global descriptors is strong enough to pull two views together. Of course, a strong perspective transformation may further improve the performance, but we see obvious limitations in commonly used ones.
Regarding the dataloader, the cropping is done on the fly since the random offsets are generated at each iteration.
Regarding the model architecture, we did not use common perspective changes, e.g., polar transformation and homography because 1. polar transformation assumes center alignment between ground and aerial view, an assumption that does not hold for fine-grained localization. 2. homography ignores above ground objects, limiting the model solely to lane markings, etc.
In general, we find the use of global descriptors is strong enough to pull two views together. Of course, a strong perspective transformation may further improve the performance, but we see obvious limitations in commonly used ones.
Thank you very much. Yesterday, you mentioned the common perspective projection transformation and would like to ask an open-ended question. Do you think using orthographic projection transformation would have better results than polar coordinate transformation and homography transformation. Best wiehes!
I couldn't find any executable function in the readdata_Oxford.py file. The file only contains class functions. How can I accurately crop the spliced satellite image into the corresponding area of the ground image