microsoft / CSWin-Transformer

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped, CVPR 2022
MIT License
539 stars 78 forks source link

How do you produce Table 9 (ablation on different attention mecahnisms) in the paper? #44

Open rayleizhu opened 1 year ago

rayleizhu commented 1 year ago

Hi, thanks for your nice work. I'm doing some comparison on different attention mechanisms, and want to follow your experimental settings. I meet two problems:

  1. Why the reported mIoU is 41.9 for Swin-T in Table 9, while it is 46.1 in Swin Paper?
  2. Can you provide detailed experimental settings for semantic segmentation and object detection in table 9 ?