microsoft / CSWin-Transformer

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped, CVPR 2022
MIT License
539 stars 78 forks source link

It's was hard to fine-tuning on other dataset. #14

Open BboyHanat opened 3 years ago

BboyHanat commented 3 years ago

I was use this network trained on image defect classification task, and it was very hard train, and get low acc, but other model, like VIP model based on mlp architecture,or pure resnet50,those model is really easy to fine-tuning on my dataset. I also adjust my lr, weight decay, batch size, data aug policy(suit my data), change optimizer, but it was not help.

AlexWang1900 commented 2 years ago

transformer-based models just need huge load of data to train. small datasets won't work as well as cnn

BboyHanat commented 2 years ago

transformer-based models just need huge load of data to train. small datasets won't work as well as cnn

I think my dataset is enough to train, I also tried on swin transformer, It got a good result. I think maybe I set some wrong hyper-parameter make this network hard to convergence. by the way, I think this paper convey a good insight, I will try it again when I have plenty of time.

ghost commented 1 year ago

你好,请问这个问题解决了吗?我也遇到了类似的问题。

BboyHanat commented 1 year ago

你好,请问这个问题解决了吗?我也遇到了类似的问题。

没空搞这个,放弃了

wujiang0156 commented 1 year ago

@BboyHanat @AlexWang1900 my dataset is also enough to train, I also tried on swin transformer, It got a good result. but this network hard to convergence. on adek20 is not better than swin transofmer.