suzannastep commented 2 years ago

The function my_e2truncnorm in R/truncnorm.R will sometimes return negative estimates of the second moment of the truncated standard normal distribution. I have attached two files to reproduce this error, along with the current negative second moments.

Both use mean \approx 54000, sd \approx 2.97. The first file truncates to (0,Inf), and the second file truncates to (-Inf,0). Most of the negative second moments are small (~1e-7), but some are fairly large (~1e+1).

suzannastep commented 2 years ago

Code to load examples:

nsm1 <- data.frame(a = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
b = c(Inf, Inf, Inf, 
Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, 
Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, 
Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, 
Inf, Inf, Inf, Inf, Inf, Inf, Inf),
mean = c(-54788.9937492435, 
-54788.7495783716, -54795.061172919, -54786.56403001, -54787.0710507828, 
-54793.8411648124, -54791.1033322952, -54790.8850621429, -54792.4474633392, 
-54789.104838451, -54790.0716305659, -54791.8116771782, -54792.2405734505, 
-54793.2180670531, -54795.066790929, -54792.5099482951, -54790.3826065817, 
-54791.77833505, -54791.1393162631, -54791.3704836432, -54792.7546212792, 
-54790.3941786588, -54790.5148148196, -54783.961766733, -54787.61580626, 
-54787.9756696761, -54793.3780437052, -54793.8414409467, -54788.5799790508, 
-54796.3350585025, -54787.6400823768, -54790.1239781587, -54790.5671618815, 
-54790.5850620408, -54786.7898117188, -54788.4238087205, -54791.6176927468, 
-54788.288365944, -54794.9353162132, -54788.9025164062, -54794.1065287576, 
-54788.5087379015, -54791.4541413838, -54789.3062131891, -54789.2876504587, 
-54790.4616344339, -54790.0682324217, -54787.6989406089, -54793.0147070028), 
sd = c(2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126),
second_moments = c(-17.6492352485657, 
-4.76837158203125e-07, -9.5367431640625e-07, -11.7154006958008, 
-9.5367431640625e-07, -4.76837158203125e-07, -12.9443755149841, 
-4.76837158203125e-07, -4.76837158203125e-07, -4.76837158203125e-07, 
-4.76837158203125e-07, -1.43051147460938e-06, -1.43051147460938e-06, 
-17.6404252052307, -9.5367431640625e-07, -4.76837158203125e-07, 
-4.76837158203125e-07, -4.76837158203125e-07, -16.2378377914429, 
-4.76837158203125e-07, -4.76837158203125e-07, -4.76837158203125e-07, 
-4.76837158203125e-07, -4.76837158203125e-07, -4.76837158203125e-07, 
-4.76837158203125e-07, -14.6725268363953, -4.76837158203125e-07, 
-4.76837158203125e-07, -1.43051147460938e-06, -14.370005607605, 
-9.5367431640625e-07, -4.76837158203125e-07, -4.76837158203125e-07, 
-9.5367431640625e-07, -13.1242661476135, -12.6084485054016, -17.5787625312805, 
-4.76837158203125e-07, -4.76837158203125e-07, -1.43051147460938e-06, 
-14.3491959571838, -4.76837158203125e-07, -4.76837158203125e-07, 
-17.066424369812, -14.5285978317261, -4.76837158203125e-07,
-9.5367431640625e-07, -4.76837158203125e-07))
rownames(nsm1) <- c("Row700", "Row702", "Row704", 
"Row705", "Row708", "Row712", "Row715", "Row716", "Row717", "Row719", 
"Row720", "Row721", "Row725", "Row727", "Row728", "Row729", "Row730", 
"Row732", "Row733", "Row736", "Row737", "Row739", "Row740", "Row744", 
"Row745", "Row748", "Row749", "Row751", "Row753", "Row754", "Row760", 
"Row763", "Row764", "Row765", "Row766", "Row767", "Row768", "Row769", 
"Row770", "Row774", "Row775", "Row776", "Row779", "Row781", "Row786", 
"Row789", "Row793", "Row795", "Row799")
nsm2 <- data.frame(a = c(-Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, 
-Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, 
-Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, 
-Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, 
-Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf), 
    b = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
mean = c(54793.6976016185, 
    54789.2789114755, 54797.7760543845, 54797.2690336117, 54796.5650050414, 
    54793.2367520993, 54791.8926210553, 54795.2352459435, 54794.2684538286, 
    54793.5968372019, 54792.099510944, 54792.2584346181, 54789.2732934655, 
    54791.8301360994, 54795.6222604304, 54791.377209057, 54792.013046847, 
    54793.9805908811, 54792.3089689607, 54796.7242781345, 54796.3644147184, 
    54790.9620406893, 54793.7143621958, 54794.9249914674, 54788.005025892, 
    54794.9131273272, 54794.0087762809, 54796.7000020177, 54789.1452011907, 
    54794.2161062358, 54792.7223916477, 54796.0517184506, 54789.4047681813, 
    54795.4375679883, 54790.2335556369, 54795.831346493, 54792.0057657992, 
    54785.7805432566, 54795.0338712054, 54791.1719658525, 54795.7868262848, 
    54790.8163913233, 54795.0524339358, 54793.8784499606, 54787.1880161754, 
    54792.0559254679, 54794.4767477245, 54796.6411437856, 54795.1046280564, 
    54791.3253773917),
sd = c(2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126), 
second_moments = c(-4.76837158203125e-07, -9.5367431640625e-07, 
    -4.76837158203125e-07, -4.76837158203125e-07, -4.76837158203125e-07, 
    -4.76837158203125e-07, -9.5367431640625e-07, -9.5367431640625e-07, 
    -9.5367431640625e-07, -13.3077096939087, -4.76837158203125e-07, 
    -4.76837158203125e-07, -4.76837158203125e-07, -1.43051147460938e-06, 
    -4.76837158203125e-07, -4.76837158203125e-07, -4.76837158203125e-07, 
    -13.4651703834534, -4.76837158203125e-07, -4.76837158203125e-07, 
    -9.5367431640625e-07, -4.76837158203125e-07, -10.5989608764648, 
    -9.5367431640625e-07, -1.43051147460938e-06, -9.5367431640625e-07, 
    -4.76837158203125e-07, -4.76837158203125e-07, -9.5367431640625e-07, 
    -9.5367431640625e-07, -9.5367431640625e-07, -9.5367431640625e-07, 
    -9.5367431640625e-07, -9.5367431640625e-07, -4.76837158203125e-07, 
    -9.5367431640625e-07, -4.76837158203125e-07, -4.76837158203125e-07, 
    -13.0829844474792, -15.7729797363281, -9.5367431640625e-07, 
    -9.5367431640625e-07, -4.76837158203125e-07, -9.5367431640625e-07, 
    -9.5367431640625e-07, -4.76837158203125e-07, -9.5367431640625e-07, 
    -9.5367431640625e-07, -1.43051147460938e-06, -11.2983322143555))
rownames(nsm2) <- c("Row701", "Row704", "Row705", "Row708", 
"Row709", "Row715", "Row717", "Row719", "Row720", "Row723", "Row725", 
"Row726", "Row728", "Row729", "Row731", "Row735", "Row738", "Row741", 
"Row743", "Row745", "Row748", "Row749", "Row750", "Row752", "Row754", 
"Row756", "Row757", "Row760", "Row762", "Row763", "Row768", "Row769", 
"Row770", "Row774", "Row775", "Row776", "Row777", "Row780", "Row781", 
"Row782", "Row783", "Row785", "Row786", "Row789", "Row790", "Row792", 
"Row794", "Row795", "Row796", "Row799")

Code to reproduce error:

library(ashr)
my_e2truncnorm(nsm1$a,nsm1$b,nsm1$mean,nsm1$sd)
my_e2truncnorm(nsm2$a,nsm2$b,nsm2$mean,nsm2$sd)

Using dput seems to have introduced some rounding that changes the output. However, this code will still produce negative second moments, so the gist of this example is unchanged. I also have the original values saved as a RDS file, but I cannot upload to github issues

stephens999 commented 2 years ago

an even simpler version extracted from your example: my_e2truncnorm(0,Inf, -54793.7, 3) gives [1] -13.65669

Interesting it is very sensitive to the mean. my_e2truncnorm(0,Inf, -55000, 3) gives [1] 4.768372e-07

stephens999 commented 2 years ago

this code from https://github.com/cossio/TruncatedNormal.jl/blob/3c16866c3afa3920e787513d492689e9e81192ca/src/tnmom2.jl#L66-L71 may be helpful:

Second moment of the truncated standard normal distribution. """ function tnmom2(a::Real, b::Real)

return tnmom2c(0, a, b)

if !(a ≤ b)
    return oftype(middle(a, b), NaN)
elseif a == b
    return middle(a, b)^2
elseif abs(a) > abs(b)
    return tnmom2(-b, -a)
elseif isinf(a) && isinf(b)
    return one(middle(a, b))
elseif isinf(b)
    return 1 + √(2 / π) * a / erfcx(a / √2)
end

@assert a < b < Inf && abs(a) ≤ abs(b)
@assert a ≤ 0 ≤ b || 0 ≤ a ≤ b

if a ≤ 0 ≤ b
    ea = √(π/2) * erf(a / √2)
    eb = √(π/2) * erf(b / √2)
    fa = ea - a * exp(-a^2 / 2)
    fb = eb - b * exp(-b^2 / 2)
    m2 = (fb - fa) / (eb - ea)
    fb ≥ fa && eb ≥ ea || error("error: a=$a, b=$b")
    0 ≤ m2 ≤ 1 || error("error: a=$a, b=$b")
    return m2
else # 0 ≤ a ≤ b
    exΔ = exp((a - b)middle(a, b))
    ea = √(π/2) * erfcx(a / √2)
    eb = √(π/2) * erfcx(b / √2)
    fa = ea + a
    fb = eb + b
    m2 = (fa - fb * exΔ) / (ea - eb * exΔ)
    a^2 ≤ m2 ≤ b^2 || error("error: a=$a, b=$b")
    return m2
end

end

pcarbo commented 2 years ago

@suzannastep @stephens999 Let me know if I can help with this.

suzannastep commented 2 years ago

There are numerical problems here happening in the (r equivalent of) the line

μ^2 + σ^2 tnmom2(α, β) + 2μ σ * tnmean(α, β)

Using the julia code,

tnmom2(0.0,Inf, -54793.7, 3.0)
μ^2 + σ^2 * tnmom2(α, β) = 6.004699137379999e9
2μ * σ * tnmean(α, β) = -6.004699137379999e9

So the final answer is exactly 0.0 . Like exactly, as in

tnmom2(0.0,Inf, -54793.7, 3.0) == 0
> true

Similarly,

tnmom2(0.0,Inf, -55000, 3.0)
μ^2 + σ^2 * tnmom2(α, β) = 6.050000018e9
2μ * σ * tnmean(α, β) = -6.050000018e9
tnmom2(0.0,Inf, -55000, 3.0) == 0
> true

This is a perfect storm for numerical problems-- differences of large numbers that are almost equal. Using the r code, we’re getting catastrophic cancellation. Any issues in computing μ^2 + σ^2 * tnmom2(α, β) or 2μ * σ * tnmean(α, β) will mean that the difference will have huge absolute error. And that's what's happening.

my_e2truncnorm(0,Inf, -54793.7, 3)
μ^2 + σ^2 * tnmom2(α, β) = 6004699123.72331
2μ * σ * tnmean(α, β) = -6004699137.38
[1] -13.65669

In actuality, the rcode reorders the sum slightly, so it's more like

my_e2truncnorm(0,Inf, -54793.7, 3)
μ^2 + 2μ * σ * tnmean(α, β)= -3002349577.69
σ^2 * tnmom2(α, β)  = 3002349564.03331
[1] -13.65669

On the other hand,

my_e2truncnorm(0,Inf, -55000, 3)
μ^2 + 2μ * σ * tnmean(α, β)= -3025000018
σ^2 * tnmom2(α, β)  = 3025000018
[1] 4.768372e-07

It looks like the issue in the rcode is coming from the relative error in σ^2 * tnmom2(α, β), which is ~1e-9. The relative error in the other term is ~1e-16

stephens999 / ashr

Negative Estimates of Second Moments #128

return tnmom2c(0, a, b)