Why didn't you used L1 and L2 regularizers to reduce BN artifacts?

xinntao / ESRGAN

ECCV18 Workshops - Enhanced SRGAN. Champion PIRM Challenge on Perceptual Super-Resolution. The training codes are in BasicSR.

https://github.com/xinntao/BasicSR

Apache License 2.0

5.91k stars 1.05k forks source link

Why didn't you used L1 and L2 regularizers to reduce BN artifacts? #70

Closed ghost closed 4 years ago

ghost commented 4 years ago

Cause everyone else does. Also, there's less of these artifacts with sigmoid activation.

ghost commented 4 years ago

Tried to train with your architecture. Regularization won't work, the artifacts are just like yours. Didn't tried sigmoid, though. But I understand why you don't want it.

ghost commented 4 years ago

Or, batch normalization could produce such a result if you have a too large kernel. But that does not seem to be the case. The question was why didn't you used regularization. It seems, that with it, the result is gray. l1_l2(.00001, .00001) seems to be okay.

xinntao commented 4 years ago

I think the BN artifacts are from the BN layers (the mismatched estimated mean and variance). L1 and L2 regularization may not be effective to remove these artifacts. Simply removing BN layers in the network architecture can reduce the artifacts without affecting the final performance.