floor instead of ceiling

DominiqueMakowski commented 5 years ago

Hi @mikemeredith,

I've noticed the commented line in HDI computation, and the replacement of Kruschke's ceiling by floor. I would be curious to know the rationale for this change ☺️

Thanks a lot!

mikemeredith commented 5 years ago

Hi Dominique,

The answer is - an accident of history. I'll explain in a moment, but first, why are they different?

Is there a difference between the `ceiling` and `floor` forms, and if so, why?

Algebraically n * (1 - credMass) == n - n * credMass, but when your computer starts dealing with real numbers, that ain't so. Let's try it:

set.seed("123", kind="Mersenne")
n <- round(runif(1e4, 1e4, 1e6))
head(n)
credMass <- 0.95

tst1 <- (n * (1 - credMass)) - (n - n * credMass)
range(tst1)
mean(tst1 != 0)

The differences are small, but < 5% are actually zero. See p.9 of The R Inferno. If the differences are so small, won't ceiling or floor fix it?

tst2 <- ceiling(n * (1 - credMass)) - (n - floor(n * credMass))
range(tst2)
mean(tst2 != 0)

Well, that's better, but there are still ~5% of cases where they differ; hence the comment "don't always give the same answer".

Should we use round instead, will that fix it?

tst3 <- round(n * (1 - credMass)) - (n - round(n * credMass))
range(tst3)
mean(tst3 != 0)

That's better, but still not perfect. It still matters which form you use.

Why did I choose `floor` instead of `ceiling`?

Long story! Back in Oct 2011, I was trying to use SPACECAP package, but at that time it Depends on TeachingDemos and that was not compatible with the newly-released R 2.14.0. In the SPACECAP source code, I found it was using TeachingDemos::emp.hpd to calculate HDIs, so I wrote my own little function from scratch to replace it, and that's still in SPACECAP.

I suspect the first versions used neither floor nor ceiling and simply fed a non-integer value as the index; R will then truncate, so the inclusion of floor makes that visible without changing the output.

In fact, TeachingDemos::emp.hpd has a bug/feature: if you ask for a CrI < 50% it gives you the complement, eg, you do emp.hpd(x, conf=0.4) and you get a 60% CrI, with no warning. And no way to actually get a 40% CrI. That may be ok for Greg Snow's teaching, but not for general use.

John Kruschke's book came out about that time and I looked at his code and initially tried ceiling, but that failed the unit tests, so I stuck with floor, even when I put together the BEST package.

Later we pulled out the hdi function and put it in a little package on its own, as Rasmus Bååth wanted to use it without having to install JAGS and I was using it in wiqid as well. At that point I spent some time getting to run faster.

I don't know which is correct: the difference will be tiny unless you have short or very lumpy vectors, and ~~the ceiling version~~ would sometimes give more conservative CrIs. I would need a good reason to change it, as that could give users different results.

Apologies! The ceiling version sometimes gives values 1 larger than floor, but that's for the number to exclude. Excluding more means narrower CrIs, so it's the floor version which is (sometimes slightly) more conservative.

Thanks for raising the question! - Mike

DominiqueMakowski commented 5 years ago

Thanks a lot for this very detailed answer!

mikemeredith commented 5 years ago

I wrote above: "the first versions used neither floor nor ceiling and simply fed a non-integer value as the index; R will then truncate". Actually the index is [1:exclude], which introduces a further gremlin. : and seq do not always truncate, they round if the nearest integer is "very close". Try this:

n <- 2.9999999 # at least seven 9's
n   # 3
floor(n) # 2
1:n # 1 2 3
LETTERS[n] # "B"
LETTERS[1:n] # "A" "B" "C"

You can get odd behaviour (ie, mysterious errors) unless you ensure only integers get in there. round or 'trunc' or whatever can save hours of debugging.

DominiqueMakowski commented 5 years ago

Thanks! we are currently trying to implement a light yet comprehensive toolbox for Bayesian indices in bayestestR. We would be happy to have your input/thoughts on its hdi function, or anything else :)

mikemeredith commented 5 years ago

Sorry, I made a mistake yesterday. It's actually the floor version which is more conservative. See edit above.

mikemeredith / HDInterval