Network Feedback - Githubissues

muslll commented 9 months ago

Hi, first of all thanks to everyone that worked on DCTLSA. I want to give some feedback regarding this project: I've added it to neosr and trained both bicubic and realistic models with it. However, I made two small changes to it: replaced the attention function with scaled_dot_product_attention to improve training speeds, and added dropout after out_B (per research findings of 'Reflash Dropout'). The comparisons bellow are from a model trained on downscaling algorithms only (to be specific nearest, bilinear, bicubic, lanczos and mitchell), using VGG Perceptual loss, LDL and at end of training FocalFrequency . The weights have been released for public use (CC0 license).

dctlsa_cmp_4

dctlsa_cmp_2

dctlsa_cmp_3

I also tested DCTLSA on complex realistic degradations (noise, compression, blur), and it performed very well:

dctlsa_cmp_anime_2

dctlsa_cmp_anime_0

dctlsa_cmp_anime_1

DCTLSA is a very training efficient network. Some areas for improvements I noticed:

Despite being a lightweight network, inference speed is still slow, making it impractical for some user scenarios.
There's a potential to improve the network restoration capability by increasing the receptive field, as recent research showed (see OmniSR and DAT and SRFormer ConvFFN).
Due to instabilities, the model can decrease the overall generated image white point, making it look darker. Adding some method to keep brightness consistency without using a new loss, such as what SPAN recently did, would be a nice step to improve it.

Thanks again to everyone that worked on this project.

Shiqi72 commented 3 months ago

你好，我想请问一下Flops是如何计算的？我在复现代码时×2得到的flops与论文中不一致 _model = model.Model(args, checkpoint) input = torch.randn(1, 3, 170, 170).cuda() flops, params = profile(_model, inputs=(input, 0)) print("flops", str(flops / 1e9)) print("params", str(params / 1e6))

zengkun301 commented 3 months ago

你好，我想请问一下Flops是如何计算的？我在复现代码时×2得到的flops与论文中不一致 _model = model.Model(args, checkpoint) input = torch.randn(1, 3, 170, 170).cuda() flops, params = profile(_model, inputs=(input, 0)) print("flops", str(flops / 1e9)) print("params", str(params / 1e6))

你好。计算×2得到的flops，为了保证输出尺寸为1280x720，输入维度应为(1，3，640，360)。

Shiqi72 commented 3 months ago

你好，我想请问一下Flops是如何计算的？我在复现代码时×2得到的flops与论文中不一致 _model = model.Model(args, checkpoint) input = torch.randn(1, 3, 170, 170).cuda() flops, params = profile(_model, inputs=(input, 0)) print("flops", str(flops / 1e9)) print("params", str(params / 1e6))

你好。计算×2得到的flops，为了保证输出尺寸为1280x720，输入维度应为(1，3，640，360)。

非常感谢您的解答，我在阅读论文时还有一个疑问，我想问一下SA可以提取全局特征，那在LFE阶段加入3×3深度卷积层扩大感受野具体有什么作用呢？

zengkun301 commented 2 months ago

你好，我想请问一下Flops是如何计算的？我在复现代码时×2得到的flops与论文中不一致 _model = model.Model(args, checkpoint) input = torch.randn(1, 3, 170, 170).cuda() flops, params = profile(_model, inputs=(input, 0)) print("flops", str(flops / 1e9)) print("params", str(params / 1e6))

你好。计算×2得到的flops，为了保证输出尺寸为1280x720，输入维度应为(1，3，640，360)。

非常感谢您的解答，我在阅读论文时还有一个疑问，我想问一下SA可以提取全局特征，那在LFE阶段加入3×3深度卷积层扩大感受野具体有什么作用呢？

您好。LFE阶段加入3×3深度卷积层目的是扩大感受野的同时不引入更多的参数量（与一般卷积相比）。SA理论上能够提取全局特征，但实际应用中捕获全局信息能力有限。

zengkun301 / DCTLSA

Network Feedback #1