Understanding `crop_size` and `Resize` in official MMSegmentation tutorial

The MMSegmentation Colab tutorial (https://colab.research.google.com/github/open-mmlab/mmsegmentation/blob/master/demo/MMSegmentation_Tutorial.ipynb) uses the Stanford background dataset. Images in this dataset are 320x240. The model backbone was trained on cityscapes with inputs of 512x1024. But then in the config they set cfg.crop_size = (256, 256) and later add these augmentations:

dict(type='Resize', img_scale=(320, 240), ratio_range=(0.5, 2.0)),
dict(type='RandomCrop', crop_size=cfg.crop_size, cat_max_ratio=0.75)

My questions are:

Since the model layers are unchanged, I am guessing that somewhere inside the model, the input is always rescaled to 512x1024. Is that true?
What is the purpose of crop_size? Should it ideally match the original training size, i.e., 512x1024 in this case?
What are the consequences of using a different crop_size, does it reduce performance?
How were crop_size and Resize chosen here, and what is the general idea for choosing them?

In my case I used a dataset with images of 512x512, and my model is Mask2Former, trained on cityscapes 512x1024. So, should I set my crop_size to 512x1024 or 512x512? And do I need to add a Resize augmentation?

Edit: I was not able to find any documentation on crop_size in the official docs, besides this example: crop_size = (512, 1024) # The crop size during training.

open-mmlab / mmsegmentation

Understanding `crop_size` and `Resize` in official MMSegmentation tutorial #3128