Open Ilia-Kosenkov opened 4 years ago
Thanks @Ilia-Kosenkov. Yeah, this is a still an issue, and the patch (to increment thin) was a bit of a hack. I'm planning a bigger edit to the samplers soon that will do thinning and sampler tuning on the TensorFlow side, rather than in R. That should resolve this issue and make the warmup period faster too.
It's worth pointing out that if you are using HMC, it's very rarely worth thinning. Instead, I'd recommend increasing the number of leapfrog steps in the HMC sampler, ie. something like mcmc(model, sampler = hmc(Lmin = 10, Lmax = 15))
. Increasing the number of leapfrog steps has the same proportional impact on run time as increasing the number of samples, and thinning by the same ratio (ie. doubling the number of leapfrog steps takes as long to evaluate as doubling the number of samples and thinning by 2), but with more leapfrog steps, the sampler is better able to explore the posterior. So you'll generally do better changing that parameter rather than the thinning.
Automatic tuning of the number of leapfrog steps should be in the minor release, as should the new NUTS implementation, which adapts the number of leapfrog steps during sampling.
Thanks for the advice, @goldingn! I understand that thinning is not worth tuning, but you can still get a crash if you tune pb_update
(I think), that's why I looked into what's happening at different thin
/pb_update
. Not sure how different warmup phase implementation is from the sampling, but it is indeed slower, maybe by a factor of 2-3, in my setups, and appears to be less CPU-efficient when running multiple chains/multiple cores. Any improvements to that will be greatly appreciated.
Right. Warmup is currently slow because we only do a few (1-3) iterations at a time in tensorflow before having to pass things back to R to tune the parameters. That incurs some overhead each time, which adds up, especially when it's a small model. TF Probability now has a tuning interface, so when we switch over to using that we'll save a lot of overhead.
I think I might even be able to iterate the progress bar without leaving tensorflow too, which would speed up sampling a little too.
This issues is, apprently, related to #241 (also #284). I believe, the root cause of #241 is that (according to docs) when
pb_update <= thin
,pb_update
is selected to bethin + 1
. This is extremely poor decision, because, as human beings, users tend to selectthin
divisable by e.g. 5, 10, thereforepb_udpate + 1
can be a prime number, and not a divisor ofn_sample
orwarmup
in themcmc
method.Now, different values of
pb_update
causegreta
to fail at the end of sampling with somepython
TensorArray-zero-size error. I ran some tests for different combinations ofpb_update
andthin
to check, what values are causing those crashes.The best thing is that
mcmc(verbose = FALSE)
appears to solve the problem, sogreta
crashes are actually caused byUI
behaviour. I was unable to locate the source of the problem, but I would say it happens at the last sampling step (maybe, here). Changes, introduced in #284, might be causing the problem.In the following table, look at #4 and #7, which update by
10x + 1
. This appears to almost always crash, and it is exactly what happens ifpb_update <= thin
.Created on 2019-09-26 by the reprex package (v0.3.0)