For all ViT and CCT experiments, as you can see in our code, we start from the pre-trained on ImageNet. Then, in the main Table 1 of our paper we train them on all the datasets: results in the column 'Melbourne' are trained on melbourne and tested on MSLS test; the second column is trained on all MSLS and test on MSLS test, last column is train and test on Robotcar with splits as described in the paper.
Yes, thank you for reminding me. We are currently exploring hosting options, so for now I added in the README a link to the model that you requested, and hopefully more will come soon
Hi, thanks for sharing your work! I have a few questions about the models: