yan-hao-tian / VW

iclr2024 poster Varying Window Attention
MIT License
118 stars 19 forks source link

The performance #5

Closed ydhongHIT closed 5 months ago

ydhongHIT commented 2 years ago

Hi, thanks for your great work. I trained the lawin + MiT-b2 with 80k iterations and the final performance is 46.64 mIoU. The training protocols are exactly the same as segformer. Here is the log file. 20220402_071143.log

yan-hao-tian commented 2 years ago

Hi ydhong. Have you tried training lawin-B2 for 160K iterations? The performance reported in table 1 is obtained by a 160k training course.

ydhongHIT commented 2 years ago

Hi ydhong. Have you tried training lawin-B2 for 160K iterations? The performance reported in table 1 is obtained by a 160k training course.

Thanks for your reply. According to my experience, training with 160k iterations won't improve a lot. Anyway, I will try it to see the performance. Besides, I trained the lawin + CSWin-T with 160k iterations and still obtained no obvious improvement. By the way, should embed_dim*3 here https://github.com/yan-hao-tian/lawin/blob/30d3cdb20d6faf03e3eac11c2c23de4fbb5639fe/lawin_head.py#L148 be changed to 512?

yan-hao-tian commented 2 years ago

Yes, it should be 512. Also, I recommend switching the proj_type in PatchEmbed from 'pool' to 'conv', which will take the group conv inlace of the mix pooling at a very little extra cost. https://github.com/yan-hao-tian/lawin/blob/92380f80a7e98b44207378dc6cfabf8dcb03f6eb/lawin_head.py#L185-L187 By the way, what is the competitor for lawin + CSwinT? UperNet or Semantic FPN?

ydhongHIT commented 2 years ago

Yes, it should be 512. Also, I recommend switching the proj_type in PatchEmbed from 'pool' to 'conv', which will take the group conv inlace of the mix pooling at a very little extra cost.

https://github.com/yan-hao-tian/lawin/blob/92380f80a7e98b44207378dc6cfabf8dcb03f6eb/lawin_head.py#L185-L187

By the way, what is the competitor for lawin + CSwinT? UperNet or Semantic FPN?

The baseline is CSWin-T + MLP decoder similar to segformer. I just want to reproduce the results of your paper in which you said you use the pooling. How much gain does using group conv instead of mix pooling bring?

ydhongHIT commented 2 years ago

During the inference stage, I find that your model requires the input resolution to be multiple of 64. For the ADE20K, I use the 'ResizeToMultiple' in mmseg to achieve this. There may be some other details which can not be presented in the paper. So when are you going to release the code? Thanks again

yan-hao-tian commented 2 years ago

Sorry for the late reply. Honestly, we have to delay the full-code release plan because lawin has not been accepted by any confs or journals up to now, and we are currently writing a new version paper.

ydhongHIT commented 2 years ago

Sorry for the late reply. Honestly, we have to delay the full-code release plan because lawin has not been accepted by any confs or journals up to now, and we are currently writing a new version paper.

Sorry for the bad thing. I note that your reproduced results of swin is much higher than the original. For example, your uper-swin-B achieves 53.0 mIoU, 1.4 higher than the original 51.6. Could you send me your training config files of swin and the corresponding lawin-swin? My Email: 2380838460@qq.com