microsoft / Cream

This is a collection of our NAS and Vision Transformer work.
MIT License
1.69k stars 230 forks source link

why not syncbn in TinyViT ? #198

Closed CYL0089 closed 1 year ago

CYL0089 commented 1 year ago

In the paper: image

but in the code: image

and the pretrain and finetune command (in the TinyViT/docs/TRAINING.md) not add the '--use-sync-bn'

`python -m torch.distributed.launch --master_addr=$MASTER_ADDR --nproc_per_node 8 --nnodes=4 --node_rank=$NODE_RANK main.py --cfg configs/22k_distill/tiny_vit_21m_22k_distill.yaml --data-path ./ImageNet-22k --batch-size 128 --output ./output --opts DISTILL.TEACHER_LOGITS_PATH ./teacher_logits/

python -m torch.distributed.launch --nproc_per_node 8 main.py --cfg configs/22kto1k/tiny_vit_21m_22kto1k.yaml --data-path ./ImageNet --batch-size 128 --pretrained ./checkpoints/tiny_vit_21m_22k_distill.pth --output ./output`

So the batch-size is 128 actually?

CYL0089 commented 1 year ago

Maybe I miss something?

wkcn commented 1 year ago

Hi @CYL0089 , thanks for your attention to our work! The batch size per GPU is 128, and the number of GPUs is 32.

We did not try to enable SyncBN, but I think the batch size per GPU of 128 is enough for BatchNorm.

CYL0089 commented 1 year ago

Hi @CYL0089 , thanks for your attention to our work! The batch size per GPU is 128, and the number of GPUs is 32.

We did not try to enable SyncBN, but I think the batch size per GPU of 128 is enough for BatchNorm.

I see, thank you