zhengchen1999 / RGT

PyTorch code for our ICLR 2024 paper "Recursive Generalization Transformer for Image Super-Resolution"
Apache License 2.0
116 stars 6 forks source link

Comparison to DAT? #3

Open Phhofm opened 6 months ago

Phhofm commented 6 months ago

Hey, thank you for your work. I had just a quick question.

I didnt look (or understand) into the technical details too much, I am writing simply from the practical/applying side (tried training some models).

Performance wise on an application standpoint RGT feels fairly similar to DAT, meaning training and inference speed wise and output quality wise, just to me personally.

I also just recently did an inference speed test, and RGT got very similar speeds to DAT (and RGT-S to DAT-S)

I simply wanted to ask on the theoretical side, if there maybe would be a usecase where DAT would be preferable (/should perform better) over RGT or RGT preferable over DAT?

Ah PS here some outputs / examples of an RGT model I recently trained:

Slowpoke Pics 6 Examples

4xRealWebPhoto_RGT_250k_comparison

4xRealWebPhoto_RGT_250k_comparison_2

---- mentioned inference speed test:

4x Inference speed test, neosr testscripts, 50 256x256 images as input, Ubuntu (Budgie) 23.10, GeForce RTX 3060 Lite Hash Rate, AMD Ryzen™ 5 3600 × 12

Sorted network option according to fastest run out of 3 runs each (first started with more, so compact and cugan had more, but then hit dat and from then on 3 each):

Compact: 1.90s, 26.35fps SPAN: 2.33s, 21.44fps SAFMN: 2.51s, 19.89fps DITN: 4.26s, 11.72fps CUGAN: 4.45s, 11.22fps OmniSR: 8.90s, 5.62fps SAFMN-L: 9.87s, 5.07fps CRAFT: 11.26s, 4.44fps DCTLSA: 11.53s, 4.43fps SwinIR-S: 14.18s, 3.53fps SRFormer-light: 16.28s, 3.07fps ESRGAN: 22.51s, 2.22fps SwinIR-M: 46.46s, 1.08fps HAT-S: 71.37s, 0.70fps RGT_S: 74.83s, 0.67fps DAT-S: 74.96s, 0.67fps SRFormer-M: 79.02s, 0.63fps DAT2: 81.90s, 0.61fps HAT-M: 90.19s, 0.55fps RGT: 96.07s, 0.52fps DAT: 97.08s, 0.52fps HAT-L: 177.75s, 0.28fps

zhengchen1999 commented 6 months ago

Hi,

Thank you for your detailed testing. The latency of RGT and DAT is close, which aligns with the calculated results of Params and FLOPs. However, RGT performs better, for instance, on the Urban100 dataset, with the following specific comparisons: <!DOCTYPE html>
Urban100
Scale-x2 Scale-x4
DAT-S 34.12 27.68
RGT-S 34.32 27.89
DAT 34.37 27.87
RGT 34.47 27.98

This is because, unlike DAT, which employs channel attention to achieve linear complexity, RGT adopts linear global spatial attention, which is more suited for SR tasks.

Phhofm commented 6 months ago

Sound good, thank you for your work :)

I was able to train an RGT and an RGT-S model. Results look good. My latest RGT-S one can be found here. In this pdf I wrote down my approach and degradation workflow for this model.

Its called '4xRealWebPhoto_v2_rgt_s' since its idea was to upscale downloaded photos from the web, so i modeled the degradations of the dataset to include scaling and compression and rescaling and recompression (by a service provider like the user uploading to social media or something, then someone else downloading and re-uploading again. Also added realistic noise and some slight lens blur).

12 slowpic examples of the output of the model in this link

And below simply some 4 examples 4xRealWebPhoto_v2_rgt_s_ex1 4xRealWebPhoto_v2_rgt_s_ex2 4xRealWebPhoto_v2_rgt_s_ex4 4xRealWebPhoto_v2_rgt_s_ex3

zelenooki87 commented 5 months ago

@Phhofm Hi. I was wondering if you could create a tutorial on how to further train models. I have an RTX 3090 graphics card with 24GB VRAM and 32GB of RAM, and I'd like to train a model for the first time. By the way, I've been using open-source projects from GitHub for 4-5 years now, so I believe with a little help I can get the hang of it. Thanks a lot!

Phhofm commented 5 months ago

Hey, you could definetly get into training your own upscaling models. You can find a few links in this readme that could help get you started: training-info neosr is what i have been using for training I might suggest starting training a compact model first, that helps you gathering experience because for it you will have to download a dataset, then use otf or degrade it, have the config correct, and have dependencies and so forth. Once you started training a compact model with default config values on a standard dataset and you see validation output be generated, then you should have gathered enough experience to be able to make/degrade your own dataset, tinker around with config values, go with a bigger transformer arch like this one. Or this is how I started basically and I think its a good way. Maybe good if you join the linked discord community in the readme, and ask questions there, because there are more people that can answer your questions or errors you run into. But yeah, the linked readme should get you started with ressources to start looking into

zelenooki87 commented 5 months ago

@Phhofm Would you mind updating the link to the Google Drive folder with new models in your "models" GitHub repository? I've successfully used some of them in Hybrid, a video editing program. Thanks!