vislearn / dsacstar

DSAC* for Visual Camera Re-Localization (RGB or RGB-D)
BSD 3-Clause "New" or "Revised" License
235 stars 36 forks source link

Does re-scaling damage the unknown scene coordinate masks? #13

Closed qiyan98 closed 2 years ago

qiyan98 commented 2 years ago

Hi,

Thanks for the wonderful open-sourced project (again)!

I have a question on the potentially harmful effects of label re-scaling. The re-scaling of the image is generally fine. But re-scaling for 3D labels may change the 0 value for invalid scene coordinate masks. https://github.com/vislearn/dsacstar/blob/3ffbcb1d4d7b0cae68902560b5a2296d8c1b77e6/dataset.py#L187-L199

In the loss function, the mask is used as follows: https://github.com/vislearn/dsacstar/blob/3ffbcb1d4d7b0cae68902560b5a2296d8c1b77e6/train_init.py#L191-L192

We are concerned about this in our project as the training labels might become accurate after augmentation. I wondered if you have some insights on this issue.

Many thanks!

qiyan98 commented 2 years ago

As indicated in the F.interpolate document, the default interpolation mode is nearest. Do you think this could help to justify the regression label re-scaling?

mode (str) – algorithm used for upsampling: 'nearest' | 'linear' | 'bilinear' | 'bicubic' | 'trilinear' | 'area'. Default: 'nearest'

Thanks.

ebrach commented 2 years ago

Hi,

yes, the nearest interpolation is important to not mix zeros (ie invalid labels) and non-zero entries. Results might differ for your specific project, but we have seen no problem with label re-scaling like this. Note, that depending on how you train (RGB mode, or end-to-end training) these labels are only used as a coarse target or an initialisation. The training will refine these labels in most circumstances. The only case where this not happens would we pre-training in RGB-D mode and then omitting end-to-end training.

Best, Eric