Could you please provide the training log ？

zihangJiang / TokenLabeling

Pytorch implementation of "All Tokens Matter: Token Labeling for Training Better Vision Transformers"

Apache License 2.0

425 stars 36 forks source link

Could you please provide the training log ？ #21

Closed YangYangGirl closed 2 years ago

zihangJiang commented 2 years ago

You can refer here for the training log of LV-ViT-S https://github.com/zihangJiang/TokenLabeling/issues/17#issuecomment-917027674 .

YangYangGirl commented 2 years ago

Thanks for your reply. I try to merge your code into the timm, but I can't reproduce your accuracy. Did I miss something?

spring.submit arun --gpu -n16 \ "python train.py /data/images --opt adamw --weight-decay 0.05 --lr 1.6e-3 --warmup-lr 1e-6 --min-lr 1e-5 --decay-epochs 30 --warmup-epochs 5 --reprob 0.25 --model lvvit_s -b 64 --apex-amp --img-size 224 --drop-path 0.1 --token-label --token-label-data /data/label_top5_train_nfnet --token-label-size 14 --model-ema --model-ema-decay 0.9992 -j 8"

summary.csv

job-4350719-20211009164547.log

args.txt

zihangJiang commented 2 years ago

Can you check if you have your random augmentation --aa correctly enabled?

YangYangGirl commented 2 years ago

Thanks a lot. I will try again ：） And I found that the line below is different from timm, it should probably be changed to rank instead of local rank.

https://github.com/rwightman/pytorch-image-models/blob/6ed4cdccca23e14de502f1f5b7087eb976238679/train.py#L587

https://github.com/zihangJiang/TokenLabeling/blob/09bb641b1e8f3e94fa1b6c7180addf4507458541/main.py#L585

Otherwise, we may encounter multiple write model conflicts when use multiple machine training (such as GPU >8).

zihangJiang commented 2 years ago

This is because your two machines are on the same file system. In this case, you only need to save the model on one thread (i.e. use args.rank == 0 condition ).