As described in this paper, the noise(sigma) increases, the respected L(W) decreases. But if we understand sigma as uncertainty of y,
maybe it's better for L to be increase with uncertainty, because it means y is harder to learn, so it needs more attention to learn?
As described in this paper, the noise(sigma) increases, the respected L(W) decreases. But if we understand sigma as uncertainty of y, maybe it's better for L to be increase with uncertainty, because it means y is harder to learn, so it needs more attention to learn?