microsoft / CSWin-Transformer

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped, CVPR 2022
MIT License
539 stars 78 forks source link

Function of drop_rate, attn_drop_rate and drop_path_rate——drop_rate, attn_drop_rate and drop_path_rate应该设置多少,模型能提高map喃 #37

Open LUO77123 opened 2 years ago

LUO77123 commented 2 years ago

1.Since drop_rate, attn_drop_rate and drop_path_rate are 0 by default, drop_path is not enabled.I want to know how much drop_path_rate , attn_drop_rate and drop_path_rate are set, and the effect of the model will be better.thanks! 由于 drop_rate, attn_drop_rate和drop_path_rate默认为0,未启用drop_path,想知道将drop_path_rate, attn_drop_rate和drop_path_rate 设置为多少,模型的效果会好一点()论文没有提到,源码默认为0)。谢谢! 2.The model compares Swin as a backbone on Mask R-CNN. I want to know whether the initial channel number (DIM) of Swin-T is 96 and that of CSwin-T is 64, that is, is CSwin-T configured in the detection network backbone in the following table? Models #Dim #Blocks sw #heads #Param. FLOPs CSWin-T 64 1,2,21,1 1,2,7,7 2,4,8,16 23M 4.3G CSWin-S 64 2,4,32,2 1,2,7,7 2,4,8,16 35M 6.9G CSWin-B 96 2,4,32,2 1,2,7,7 4,8,16,32 78M 15.0G CSWin-L 144 2,4,32,2 1,2,7,7 6,12,24,48 173M 31.5G 模型在Mask R-CNN上作为Backbone对比了SWin,我想知道Swin-T的初始通道数(dim)是96,而CSWin-T的初始通道数(dim)是64吗,也就是说CSWin-T是下表配置在检测网络backbone中吗? Models #Dim #Blocks sw #heads #Param. FLOPs CSWin-T 64 1,2,21,1 1,2,7,7 2,4,8,16 23M 4.3G CSWin-S 64 2,4,32,2 1,2,7,7 2,4,8,16 35M 6.9G CSWin-B 96 2,4,32,2 1,2,7,7 4,8,16,32 78M 15.0G CSWin-L 144 2,4,32,2 1,2,7,7 6,12,24,48 173M 31.5G

Andy1621 commented 2 years ago

Here I give some experience in my UniFormer, you can also follow our work to do it~

  1. drop_path_rate has been used in the models. As for dropout, it does not work if you have used droppath.
  2. All the backbones are the same in both classification, detection and segmentation.
LUO77123 commented 2 years ago

Here I give some experience in my UniFormer, you can also follow our work to do it~

  1. drop_path_rate has been used in the models. As for dropout, it does not work if you have used droppath.
  2. All the backbones are the same in both classification, detection and segmentation.

最后想请问一下,在cswin.py的159行 if last_stage: self.branch_num = 1,165-171行表明最后一层 LePEAttention只执行一次(因为224的输入,最后一层特征图是7x7,所以窗口就是7x7),但是用在下游检测任务,最后一层特征图不是77(以896为例,最后一层特征图是28x28,如果 LePEAttention只执行一次,那么窗口就是28x28),所以对于下游任务,最后一层 LePEAttention只执行几次喃(1还是2)?但是给的预训练权重,因为都是224224(窗口7x7)或者384384(窗口12x12),最后一层LePEAttention只执行一次,没有2次的预训练权重,作者是怎么用到下游任务的喃

go-ahead-maker commented 2 years ago

这样一看,swin用的固定7x7窗口在下游任务(如检测)上如果不进行fine-tuning的话对于stage4来说也不是full-attention,这里应该也是默认这种情况的,所以这里对于下游任务来说stage4应该可以看做7x7的swin咯

LUO77123 commented 2 years ago

这样一看,swin用的固定7x7窗口在下游任务(如检测)上如果不进行fine-tuning的话对于stage4来说也不是full-attention,这里应该也是默认这种情况的,所以这里对于下游任务来说stage4应该可以看做7x7的swin咯

7x7就需要2个LePEAttention,这样导入权重就只能导入四分之三,直接用整个特征图吧