Closed DominiqueMakowski closed 5 years ago
Hi Dominique,
The answer is - an accident of history. I'll explain in a moment, but first, why are they different?
ceiling
and floor
forms, and if so, why?Algebraically n * (1 - credMass) == n - n * credMass
, but when your computer starts dealing with real numbers, that ain't so. Let's try it:
set.seed("123", kind="Mersenne")
n <- round(runif(1e4, 1e4, 1e6))
head(n)
credMass <- 0.95
tst1 <- (n * (1 - credMass)) - (n - n * credMass)
range(tst1)
mean(tst1 != 0)
The differences are small, but < 5% are actually zero. See p.9 of The R Inferno.
If the differences are so small, won't ceiling
or floor
fix it?
tst2 <- ceiling(n * (1 - credMass)) - (n - floor(n * credMass))
range(tst2)
mean(tst2 != 0)
Well, that's better, but there are still ~5% of cases where they differ; hence the comment "don't always give the same answer".
Should we use round
instead, will that fix it?
tst3 <- round(n * (1 - credMass)) - (n - round(n * credMass))
range(tst3)
mean(tst3 != 0)
That's better, but still not perfect. It still matters which form you use.
floor
instead of ceiling
?Long story! Back in Oct 2011, I was trying to use SPACECAP
package, but at that time it Depends
on TeachingDemos
and that was not compatible with the newly-released R 2.14.0. In the SPACECAP
source code, I found it was using TeachingDemos::emp.hpd
to calculate HDIs, so I wrote my own little function from scratch to replace it, and that's still in SPACECAP
.
I suspect the first versions used neither floor
nor ceiling
and simply fed a non-integer value as the index; R will then truncate, so the inclusion of floor
makes that visible without changing the output.
In fact, TeachingDemos::emp.hpd
has a bug/feature: if you ask for a CrI < 50% it gives you the complement, eg, you do emp.hpd(x, conf=0.4)
and you get a 60% CrI, with no warning. And no way to actually get a 40% CrI. That may be ok for Greg Snow's teaching, but not for general use.
John Kruschke's book came out about that time and I looked at his code and initially tried ceiling
, but that failed the unit tests, so I stuck with floor
, even when I put together the BEST
package.
Later we pulled out the hdi
function and put it in a little package on its own, as Rasmus Bååth wanted to use it without having to install JAGS and I was using it in wiqid
as well. At that point I spent some time getting to run faster.
I don't know which is correct: the difference will be tiny unless you have short or very lumpy vectors, and the would sometimes give more conservative CrIs. I would need a good reason to change it, as that could give users different results.ceiling
version
Apologies! The ceiling
version sometimes gives values 1 larger than floor
, but that's for the number to exclude. Excluding more means narrower CrIs, so it's the floor
version which is (sometimes slightly) more conservative.
Thanks for raising the question! - Mike
Thanks a lot for this very detailed answer!
I wrote above: "the first versions used neither floor
nor ceiling
and simply fed a non-integer value as the index; R will then truncate". Actually the index is [1:exclude]
, which introduces a further gremlin. :
and seq
do not always truncate, they round if the nearest integer is "very close". Try this:
n <- 2.9999999 # at least seven 9's
n # 3
floor(n) # 2
1:n # 1 2 3
LETTERS[n] # "B"
LETTERS[1:n] # "A" "B" "C"
You can get odd behaviour (ie, mysterious errors) unless you ensure only integers get in there. round
or 'trunc' or whatever can save hours of debugging.
Thanks! we are currently trying to implement a light yet comprehensive toolbox for Bayesian indices in bayestestR. We would be happy to have your input/thoughts on its hdi function, or anything else :)
Sorry, I made a mistake yesterday. It's actually the floor
version which is more conservative. See edit above.
Hi @mikemeredith,
I've noticed the commented line in HDI computation, and the replacement of Kruschke's
ceiling
byfloor
. I would be curious to know the rationale for this change ☺️Thanks a lot!