zhengchen1999 / DAT

PyTorch code for our ICCV 2023 paper "Dual Aggregation Transformer for Image Super-Resolution"
Apache License 2.0
386 stars 37 forks source link

Big model version? #8

Open Phhofm opened 1 year ago

Phhofm commented 1 year ago

Thank you for being so active on this repo and also with answering people in the github Issues :) That is great :)

This one is simply an idea/question I thought id ask if its even worth it (if you would not know, no worries).

When looking at HAT, they made a HAT-L model (since DAT uses similiar config parameters it reminded me of HAT) and the only thing they changed in their config was depths and num_heads parameters from an array of 6x'6' to an array of 12x'6'.

image

I thought about if we could push DAT a bit in the same way and apply the same thing:

image

I mean my local ressources are very limited, I just let it run for one night of training, this is just 41k from scratch, net_g model file is around 613MB, validation output seems to work (1k iter steps):

DAT_B_val

My question would be if this even makes sense to just up depths and num_heads, if you think that this could lead to better visual output, or not? If this is applicable to DAT also or not, i mean its a different network, it was just an idea i had (i mean, im just trying something out, but might not be worth it, or i would need to change other things also, I simply thought I'd ask)

(and PS since im writing here already, musl had the idea that we could also try to add/use 'nearest+conv' upsampler (next to pixelshuffle default in dat, or pixelshuffledirect for dat_light) they used in SwinIR where they used it for real-world SR to reduce blockiness/artifacts in their opinion, like they have here so thats just an idea i thought id mention).

image

zhengchen1999 commented 1 year ago

Wonderful attempts and questions. For "increasing depths and num_heads in DAT". It can improve performance (I have done similar experiments). While increasing the depths improves performance, it will also increase FLOPs and Params. I don't think the performance gain is significant enough compared to the added overhead. Similar conclusions can be obtained from the ablation study of SwinIR (Sec. 4.2).

For "'nearest+conv' for real-world SR", I think it might work. But I haven't tried real-world SR. Thus, I'm not sure.

Phhofm commented 1 year ago

Wow thank you for the fast reply :)