microsoft / CSWin-Transformer

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped, CVPR 2022
MIT License
539 stars 78 forks source link

I tested several segmentation data, and I doubt your performance #20

Closed zhouyi-git closed 2 years ago

matt022899 commented 2 years ago

I have the same problem. I tried to replicate the results of semantic fpn and uppernet using default configs of Swin Transformer and mmseg as the paper said they dont use any tricks. But the results are about 1~3 points lower than the results the paper reported. It is also the same with the Detection models. I also doubt the results are not true or use some special tricks not claimed in the paper.

matt022899 commented 2 years ago

Also I noticed many issues concerning the same problem. The best way to prove the effectiveness of the backbone is to release the config and training log of the downstream tasks as Swin Transformer and PVT do. As the paper said, they used default configs , thus does not involve any commercial secrets to just release the logs of training and configs. I hope the authors could reply positively to make it convinced.

zhouyi-git commented 2 years ago

我也注意到很多关于同一问题的问题。证明主干有效性的最好方法是像 Swin Transformer 和 PVT 一样发布下游任务的配置和训练日志。正如论文所说,他们使用的是默认配置,因此不涉及任何商业机密,只发布培训和配置的日志。我希望作者能够积极回复以使其信服。

yes,Can you communicate, e-mail 1092622009@qq.com

LightDXY commented 2 years ago

code and models released

AnukritiSinghh commented 2 years ago

We too noticed the inconsistency. The result we are able to replicate is at least 4 points less than that claimed in the paper.