Open heleifz opened 2 years ago
There are no other ways unless you replace BN with GN. I sightly think about the problem. The inconsistent mainly appends in forward process. In each sub iteration, we can not get the statistics right. Because we can't get the next iteration‘s statistics. So we could not estimate statistics for the population. I think we can cache output after BN layer. Like syncBN, we do synchronous. But it carrys a big cost. If you have time, I would be pleasure to exchange views.
BatchNorm is very common in CV models, when training = True, the running statistics in BatchNorm layers is changing in every chunk.