raoyongming / DynamicViT

[NeurIPS 2021] [T-PAMI] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
https://dynamicvit.ivg-research.xyz/
MIT License
551 stars 69 forks source link

Reproduce the result on single gpu #41

Closed King4819 closed 8 months ago

King4819 commented 8 months ago

I want to ask that whether the method performs well when training on single gpu ? Are there any hypaparameter settings we need to adjust when training on single gpu ? Thanks!

raoyongming commented 8 months ago

Hi @King4819, to keep the training process consistent with more gpus, you need to keep the overall batch size the same. You can add the "--update_freq 8" flag to accumulate the gradient of 8 steps.

King4819 commented 8 months ago

Hi @King4819, to keep the training process consistent with more gpus, you need to keep the overall batch size the same. You can add the "--update_freq 8" flag to accumulate the gradient of 8 steps.

Excuse me, can I ask more explicitly ?

If your experiment is --batch_size 256 on 8 gpus

I have to keep --batch_size 256, and add "--update_freq 8" ?

Thanks!

raoyongming commented 8 months ago

Yes, just keep the batch size per gpu num_gpu update_freq the same.

King4819 commented 8 months ago

Yes, just keep the batch size per gpu num_gpu update_freq the same.

Excuse me, I want to ask that so the "--batch_size" hypa-parameter specify total batch size ? Or batch size on each gpu ?

For example, when setting --batch_size 256 and --num_gpu 8, each gpu has batch size 256 ? Or each gpu has batch size 256/8 ?

Thanks!

raoyongming commented 8 months ago

“--bach_size" here is the batch size on each gpu.

King4819 commented 8 months ago

Thanks for your reply !