Closed ghost closed 5 years ago
Tried to train with your architecture. Regularization won't work, the artifacts are just like yours. Didn't tried sigmoid, though. But I understand why you don't want it.
Or, batch normalization could produce such a result if you have a too large kernel. But that does not seem to be the case. The question was why didn't you used regularization. It seems, that with it, the result is gray. l1_l2(.00001, .00001) seems to be okay.
I think the BN artifacts are from the BN layers (the mismatched estimated mean and variance). L1 and L2 regularization may not be effective to remove these artifacts. Simply removing BN layers in the network architecture can reduce the artifacts without affecting the final performance.
Cause everyone else does. Also, there's less of these artifacts with sigmoid activation.