About AMP and batch size

youngwanLEE commented 3 years ago

Hi, I'm very impressed by your excellent work! Thanks for sharing your code.

I have questions about the training protocol.

In your paper,

"We train all models with a global batch size of 2048 with the NVIDIA Automatic Mixed Precision(AMP) enabled."

but the training script denotes the batch size of 256, instead of 2048.

I wonder two points from here.

1) Can I re-produce the result accuracy in this repo by using this command (batch size=256, instead of 2048)?

2) Does this repo contains AMP?

Thanks in advance :)

xwjabc commented 3 years ago

Hi @youngwanLEE, thank you for your interest in our work!

For your batch size concern, actually 256 means the batch size per GPU. The 8-GPU setting is used in the default training command and it has a total batch size 2048. You should be able to re-produce the result accuracy (similar to our reported ones) using the command provided in this repo.
Yes. The AMP is enabled at default.

youngwanLEE commented 3 years ago

@xwjabc Thanks for your quick reply :)

mlpc-ucsd / CoaT