Closed victorwctsang closed 1 year ago
Potential causes:
OK I've found the problem, it is none of the above 🤣
If you use
options(error=recover)
you can get into the function workspace at the point the error is produced. I did this, entering the estimate_quantile.minmi
workspace. You can see in this workspace that at the time the error was produced q was 0.975. I then ran:
thetas = seq(10000,25000,length=100)
eeqns = rep(NA,length(thetas))
for (iTheta in 1:length(thetas))
eeqns[iTheta] = estimating_eqn(thetas[iTheta], q, K, u, m, n, eps.sigma)
plot(thetas,eeqns,type="l")
abline(a=0,b=0,col="red")
Note it is monotonically increasing towards zero.
max(eeqns,na.rm=TRUE)
But it never got to zero. The problem appears to be that we got a NaN error before we got to the solution (which is actually there if we could get to it). Getting inside estimating_eqn
to see what is happening:
theta = 2200
F.eps.m <- pnorm(m - theta, mean = 0, sd = eps.sigma)
F.eps.K <- pnorm(K - theta, mean = 0, sd = eps.sigma)
e <- uniform_to_tnorm(u, eps.sigma, a = -Inf, b = m - theta)
psi.hat <- apply((m - e - theta) / (K - e - theta), MARGIN = 1, FUN=mean)
and we can store the key vector in the estimating equation as
ee = log(1 - F.eps.m / F.eps.K * psi.hat)
print(ee)
and look at its values. Notice that one of the values is NaN:
whichNA=which(is.na(ee))
it is the 27th value. Notice also that for any reasonable value of theta, all values of ee
are small except for one:
whichBig=which(ee==min(ee))
is the 33rd value. What is different about these values... well looking at eps.sigma
summary(eps.sigma)
it has a very wide range of values, going from 194 up to 4783. The value with big ee
is the one with the largest sd (almost double everything else) and value that gives an NA for ee
is the one with smallest sd. So the problem seems to be that when given extreme ranges of sds (these ones look like they have an impractically wide range to me - check with the data we have but I think they tend to vary less than this), we encounter machine error computing ee
contributions for each observation before we get to the solution.
Looking more deeply to see where the NA comes from, it looks like its source is the uniform_to_tnorm
function when asked to truncate normal values a long way from the origin when sd is small. In these cases however psi.hat is one anyway so I would suggest replacing these values with one where they occur, along the lines of
psi.hat[(m-theta)/eps.sigma<-XXX]=1
we could go with
return(sum(log(1 - F.eps.m / F.eps.K * psi.hat), na.rm=TRUE) - log(q))
but that is more dangerous because there might be other reasons for NaN other than truncating a normal at a ridiculously small value
to escape error recovery in future work type
options(error=NULL)
@victorwctsang please check you agree and close if appropriate
rror message is:
Reproducing the error
Code to reproduce the error:
Additional information:
synthetic-data.RData
:synthetic.data.config
: these are the constants used to generate the synthetic data.datasets
: these are the synthetic datasets. For each dataset, there is a fossil record W with $n = $n.samples
and the error factor used. All datasets used the same set of $\sigma$'s, which are stored insynthetic.data.config
asfossil.sd
cc @dwarton