luyug / GradCache

Run Effective Large Batch Contrastive Learning Beyond GPU/TPU Memory Constraint
Apache License 2.0
345 stars 19 forks source link

How to handle BatchNorm ? #15

Open heleifz opened 2 years ago

heleifz commented 2 years ago

BatchNorm is very common in CV models, when training = True, the running statistics in BatchNorm layers is changing in every chunk.

zzk2021 commented 1 year ago

There are no other ways unless you replace BN with GN. I sightly think about the problem. The inconsistent mainly appends in forward process. In each sub iteration, we can not get the statistics right. Because we can't get the next iteration‘s statistics. So we could not estimate statistics for the population. I think we can cache output after BN layer. Like syncBN, we do synchronous. But it carrys a big cost. If you have time, I would be pleasure to exchange views.