Just wanted to make sure my understanding is right. Looking at the formulas, on the edge case, maximizing the probability for x=-1 is maximizing this function:
logProb(μ, σ) = (-1 - μ)/σ - log(1 + e^((-1 - μ)/σ))
Which means for optimality μ->-infinity
And this work because at inference time the predicted values are clipped between the effective pixel range?
How big is the impact on the neighboring edge values for which the optimal μ is -0.997 etc ?
Just wanted to make sure my understanding is right. Looking at the formulas, on the edge case, maximizing the probability for
x=-1
is maximizing this function:logProb(μ, σ) = (-1 - μ)/σ - log(1 + e^((-1 - μ)/σ))
Which means for optimality μ->-infinity And this work because at inference time the predicted values are clipped between the effective pixel range? How big is the impact on the neighboring edge values for which the optimal μ is -0.997 etc ?