Mismatch in number of images after reformatting MSLS train and val

divyagupta25 commented 1 year ago

Hi,

As I understand, the reformat script is reformatting the tree structure of the 24 train and val cities. I checked the number of images in my downloaded directory and reformatted directory but they do not match.

I am using find DIR_NAME -name "*.jpg" | wc -l for the train_val directory which has the following tree structure (truncated)

train_val/ ├── amman │ ├── database │ │ └── images │ └── query │ └── images ├── amsterdam │ ├── database │ │ └── images │ └── query │ └── images ├── austin │ ├── database │ │ └── images │ └── query │ └── images ├── bangkok │ ├── database │ │ └── images │ └── query │ └── images ├── berlin │ ├── database │ │ └── images │ └── query │ └── images ├── boston │ ├── database │ │ └── images │ └── query │ └── images ...

and the reformatted directory has the following tree structure (truncated)

MSLS_reformatted/ ├── train │ ├── database │ │ ├── 01wued1CR1e8RUnYJV5ixA │ │ ├── 02glveumgzo7qevizo9uzl │ │ ├── 04dRUtmVRfGu6tAQKyd8qg │ │ ├── 07Zr1na-73N9IyFbRi0ZRg │ │ ├── 0a7yoyu12mp3e35rled934 ...

I am getting 1464298 images in originally downloaded directory (train_val/) and 1447861 images in reformatted directory. Could you please suggest what could be the issue?

ga1i13o commented 1 year ago

Hello, I confirm that I have the same numbers of images that you report. The reformatting script skips the panorama images, which are sporadically present in MSLS (around 15k), which are not suited for the seq2seq paradigm.

If you want to remove this behavior you can comment these lines: https://github.com/vandal-vpr/vg-transformers/blob/3947df2469d54aca7dfe3b6f6b5b22c242c1c41b/main_scripts/msls/1_reformat_mapillary.py#L119-L120

divyagupta25 commented 1 year ago

Thank you for the information!

vandal-vpr / vg-transformers

Mismatch in number of images after reformatting MSLS train and val #17