zhengchen1999 / DAT

PyTorch code for our ICCV 2023 paper "Dual Aggregation Transformer for Image Super-Resolution"
Apache License 2.0
386 stars 37 forks source link

More about issue 4 #7

Open MeycL opened 1 year ago

MeycL commented 1 year ago

I have the same doubt as the author of issue 4, since DAT sets the image size to 64. For example, if I have a 256×256 image as input, how do I preprocess it? Just resize?

zhengchen1999 commented 1 year ago

The image size (e.g., 64) in DAT is the input image size during training. Namely, we apply input images of size 64 × 64 to train DAT. But this is just for training convenience. DAT can support images of any size. For example, under the SR-x2 task, you take an input of size $640 \times 380$ and get an output image of $1280 \times 760$.

styler00dollar commented 1 year ago

How important is it to set img_size during training? I was finetuning the official models the last few weeks without adjusting this value and models still do seem to work quite good. Does it matter more if a model gets trained from scratch?

Also as a sidenote, thanks for making 2x models and models with different inference requirements. DAT is my favorite network currently.

zhengchen1999 commented 1 year ago

The img_size equals the patch size. But not mandatory. For "Does it matter more if a model gets trained from scratch?". In training, patch size has an impact on model performance. When the batch size is 48x48, the model's performance is lower than that of 64x64. But img_size doesn't affect performance.

In fact, the img_size is to simplify the calculation of the mask (for SW-SA) in training. Considering that the input size does not change during training, we preserve the mask value corresponding to img_size to speed up the calculation. If the input image size is not equal to img_size, the mask needs to be recalculated (https://github.com/zhengchen1999/DAT/blob/main/basicsr/archs/dat_arch.py#L398).

styler00dollar commented 1 year ago

Thanks for the quick reply.