Hi, great idea!
Recently I was trying to reproduce the excellent IFormer, but I encountered a bottleneck in some details, so I would like to ask you:
① Upsample method of low frequency branch: Is it implemented by interplote or TransposeConv?
② The channel split ratio of Stage 3 changes within the stage: how do the channel split ratios of iFormer-s, iFormer-b, and iFormer-L change in Stage 3?
③ The use of LayerScale for training is mentioned in the paper. For the case of 224×224 input, is LayerScale also used?
④ The article mentioned that the training configuration follows the three standard processing methods of [6, 22, 29]. Is it based on the method model of that article?
Hi, 非常棒的idea!
最近我在尝试复现优秀的IFormer,然而在一些细节上遇到了瓶颈,因此想向您请教一下:
① 低频分支的Upsample方法:请问是interplote还是TransposeConv实现的呢?
② Stage3的channel split ratio在阶段内存在变化:请问iFormer-s、iFormer-b、iFormer-L在Stage 3中的channel split ratio分别具体是如何变化的呢?
③ 论文中提及使用LayerScale进行训练。请问对于224×224输入的情况,是否也使用了LayerScale呢?
④ 文章中提到了训练配置遵循[6, 22, 29]的三种标准处理方式,请问具体是以那一篇文章的方法模型为基础呢?
Hi, great idea! Recently I was trying to reproduce the excellent IFormer, but I encountered a bottleneck in some details, so I would like to ask you: ① Upsample method of low frequency branch: Is it implemented by interplote or TransposeConv? ② The channel split ratio of Stage 3 changes within the stage: how do the channel split ratios of iFormer-s, iFormer-b, and iFormer-L change in Stage 3? ③ The use of LayerScale for training is mentioned in the paper. For the case of 224×224 input, is LayerScale also used? ④ The article mentioned that the training configuration follows the three standard processing methods of [6, 22, 29]. Is it based on the method model of that article?
Hi, 非常棒的idea! 最近我在尝试复现优秀的IFormer,然而在一些细节上遇到了瓶颈,因此想向您请教一下: ① 低频分支的Upsample方法:请问是interplote还是TransposeConv实现的呢? ② Stage3的channel split ratio在阶段内存在变化:请问iFormer-s、iFormer-b、iFormer-L在Stage 3中的channel split ratio分别具体是如何变化的呢? ③ 论文中提及使用LayerScale进行训练。请问对于224×224输入的情况,是否也使用了LayerScale呢? ④ 文章中提到了训练配置遵循[6, 22, 29]的三种标准处理方式,请问具体是以那一篇文章的方法模型为基础呢?