Some question of the model structual details

Hi, great idea! Recently I was trying to reproduce the excellent IFormer, but I encountered a bottleneck in some details, so I would like to ask you: ① Upsample method of low frequency branch: Is it implemented by interplote or TransposeConv? ② The channel split ratio of Stage 3 changes within the stage: how do the channel split ratios of iFormer-s, iFormer-b, and iFormer-L change in Stage 3? ③ The use of LayerScale for training is mentioned in the paper. For the case of 224×224 input, is LayerScale also used? ④ The article mentioned that the training configuration follows the three standard processing methods of [6, 22, 29]. Is it based on the method model of that article?

Hi, 非常棒的idea！最近我在尝试复现优秀的IFormer，然而在一些细节上遇到了瓶颈，因此想向您请教一下： ① 低频分支的Upsample方法：请问是interplote还是TransposeConv实现的呢？ ② Stage3的channel split ratio在阶段内存在变化：请问iFormer-s、iFormer-b、iFormer-L在Stage 3中的channel split ratio分别具体是如何变化的呢？ ③ 论文中提及使用LayerScale进行训练。请问对于224×224输入的情况，是否也使用了LayerScale呢？ ④ 文章中提到了训练配置遵循[6, 22, 29]的三种标准处理方式，请问具体是以那一篇文章的方法模型为基础呢？

sail-sg / iFormer

Some question of the model structual details #12