about encoder design - Githubissues

joshyZhou / AST

Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration

Apache License 2.0

48 stars 0 forks source link

Hi,

Here are some insights of our design approach:

As noted by Xiao et al. [1], early layers of transformers tend to focus on learning local patterns, which somewhat undermines the advantage of self-attention’s large receptive field. To address this, incorporating convolutional layers can be an efficient strategy, as they excel at capturing local patterns. This approach is also evident in other transformer-based models, such as FFTformer [2] and FPro [3].

References: [1] Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollar, and Ross Girshick. "Early Convolutions Help Transformers See Better." In NeurIPS, 2021. [2] Kong L, Dong J, Ge J, Li M, Pan J. "Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring." In CVPR 2023. [3] Zhou S, Pan J, Shi J, Chen D, Qu L, Yang J. "Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration." In ECCV 2024.

joshyZhou / AST

about encoder design #8