Since the model layers are unchanged, I am guessing that somewhere inside the model, the input is always rescaled to 512x1024. Is that true?
What is the purpose of crop_size? Should it ideally match the original training size, i.e., 512x1024 in this case?
What are the consequences of using a different crop_size, does it reduce performance?
How were crop_size and Resize chosen here, and what is the general idea for choosing them?
In my case I used a dataset with images of 512x512, and my model is Mask2Former, trained on cityscapes 512x1024. So, should I set my crop_size to 512x1024 or 512x512? And do I need to add a Resize augmentation?
Edit: I was not able to find any documentation on crop_size in the official docs, besides this example: crop_size = (512, 1024) # The crop size during training.
The MMSegmentation Colab tutorial (https://colab.research.google.com/github/open-mmlab/mmsegmentation/blob/master/demo/MMSegmentation_Tutorial.ipynb) uses the Stanford background dataset. Images in this dataset are 320x240. The model backbone was trained on cityscapes with inputs of 512x1024. But then in the config they set cfg.crop_size = (256, 256) and later add these augmentations:
My questions are:
In my case I used a dataset with images of 512x512, and my model is Mask2Former, trained on cityscapes 512x1024. So, should I set my crop_size to 512x1024 or 512x512? And do I need to add a Resize augmentation?
Edit: I was not able to find any documentation on
crop_size
in the official docs, besides this example:crop_size = (512, 1024) # The crop size during training.