Hi, sorry to disturb you again. I want to ask the questions about training with SwinTransformer, and the title may not be appropriate.
I have successfully reproduced the UDA result of SegFormer in Table 1, which finally is 58.82 and close to your reported result.
Meanwhile, I did the same experiment with Swin-B, however, the result was worse than SegFromer, where the best performance is 48.1 at 24000 iters (the training was stopped unexpectedly), but the best performance is 53.81 for SegFormer at 24000 iters. The dataset, training, and other parameters are the same with SegFormer, and modification is only the model.
Hi, sorry to disturb you again. I want to ask the questions about training with SwinTransformer, and the title may not be appropriate.
I have successfully reproduced the UDA result of SegFormer in Table 1, which finally is 58.82 and close to your reported result.
Meanwhile, I did the same experiment with Swin-B, however, the result was worse than SegFromer, where the best performance is 48.1 at 24000 iters (the training was stopped unexpectedly), but the best performance is 53.81 for SegFormer at 24000 iters. The dataset, training, and other parameters are the same with SegFormer, and modification is only the model.
The training log is here: gta2cs_dacs_swin_base_poly10warm_s0.log
Did you do experiments with SwinTransformer before? Why there is a big performance gap? Can you share your points about this?
Looking forward to your reply. :-)