Closed wfolta closed 7 years ago
I am not sure if this should work. Since brms currently uses a gamm4 like implementation, I assume it should not, but I have to go into the details to understands this thoroughly.
The error we see, however, is a bug in the parsing of smoth terms. I will try to fix it today and then we can try again to get the MRF smoothing working.
Correct, gamm4
doesn't support the "mrf" smooth, so I assume you also would not. In gamm4
's case, it has to do with their dependence on lmer
, which they use under the hood via some kind of conversion trick. The ultimate solution for brms
is -- as you mention in the documentation -- via the autocor=
and a cor_brms
. But it may be that brm
could sense the bs="mrf"
and say, "brms doesn't yet support spatial autocorrelation, but when it does it will use the autocor= mechanism".
Or maybe you actually could use s
's "mrf" smooth... It automatically handles things like turning polygons into an adjacency matrix and so on, so it has some useful magic built-in. If it fits into your schema and provides similar power to a cor_brms
solution.
You are right. First, I have to get rid of the current parsing bug when specifying argument xt
and then I will dig deeper into the MRF stuff. I think it is definitely worth implementing this sooner or later.
I think the bug has been fixed and the MRF smooth seems to work now in brms. At least for the sample data from the doc of mgcv, I get very similar estimates as compared to gamm
, but of course we don't get such nice looking maps. ;-)
I would be grateful if you tried it out using the dev version from github.
I noticed that this example also works with gamm4
and I thus wonder if I missed something. Didn't you say gamm4
does not support MRF?
Odd, I just ran gam
(from mgcv
) and it works fine with the xt
I'm using. When I use gamm4
(from gamm4
) with the same arguments, I get an error:
Error in if (as.numeric(vr[[i]] > 0)) sp[k] <- scale/as.numeric(vr[[i]]) else sp[k] <- 1e+10 :
argument is not interpretable as logical
In addition: Warning message:
In sqrt(D) : NaNs produced
Which I thought I'd traced down to gamm4
using lme4
under the hood, and only supporting a limited subset of smoothers -- univariate only? -- but maybe it's just the particular xt
I tested with. (I'm using data from CARBayesdata
that I had to convert to something "mrf" liked -- in gam
, anyhow.)
When I ran the same thing in brm
, I got an error:
Error in FUN(X[[i]], ...) : Stan does not support NA (in Zs_1_1) in data
In addition: Warning messages:
1: In sqrt(D) : NaNs produced
2: In sqrt(D) : NaNs produced
3: In sqrt(D) : NaNs produced
4: In sqrt(D) : NaNs produced
5: In sqrt(D) : NaNs produced
failed to preprocess the data; sampling not done
which is similar to what I get in gamm4
. Hmmm... my conversion of the data seemingly satisfies the s(..., bs="mrf", ...)
as used by gam
, but not as used by gamm4
and brm
. Perhaps they use different arguments in the underlying workhorse functions. So I'll have to try other spatial datasets.
P.S. I'm doing a ordinal logistic regression at work, and brm
was way faster and more reliable than stan_polr
. I didn't have time to figure everything out, but I never got stan_polr
to run without divergence warnings, while brm
did so the first time. I had to switch from cumulative
to sratio
(or something like that) before I started getting issues in brm
. (cumulative
makes sense for my use case, so the switch to sratio
was more of an experiment, to see if any of the families would cause problems.)
Somewhere in the conversion, NA
is produced and gam(m)
seems to handle them correctly, while gamm4
and brm
don't. This is somehow strange as all function are using mgcv on the backend for the preparation of smoothing terms. Would you mind sending me a reproducible example? Maybe I can fix this in brms and point Simond Wood to this problem in gamm4.
The difference between gam(m)
on the one hand and gamm4
and brm
on the other hand is just that the latter two don't suppert te
and ti
smoother, but s
and t2
work fine. For gamm4
this is because in uses lme4 internally, but for brm
it is just a matter of my time to spend on this feature as the implementation of te
and ti
is less straight forward. The t2
smoother seems to be a good alternative to te
and ti
, though.
I started with the CARBayes
example:
library (CARBayesdata)
data (GGHB.IG)
data (pricedata)
missing.IG <- setdiff (rownames(GGHB.IG@data), pricedata$IG)
missing.IG.row <- which (missing.IG==rownames(GGHB.IG@data))
propertydata.spatial <- GGHB.IG[-missing.IG.row, ]
propertydata.spatial@data <- data.frame (propertydata.spatial@data, pricedata)
propertydata.spatial@data$logprice <- log (propertydata.spatial@data$price)
propertydata.spatial@data$logdriveshop <- log (propertydata.spatial@data$driveshop)
Then I did a quick-n-dirty conversion of that data structure to something that works with an mrf smooth:
xt <- list (polys=lapply (propertydata.spatial@polygons, function (x) x@Polygons[[1]]@coords))
with gam
:
g1 <- gam (logprice ~ s(IG, bs="mrf", k=20, xt=xt) + s(crime) + rooms + sales + type + logdriveshop, data=propertydata.spatial@data)
and with brm
:
g3 <- brm (logprice ~ s(IG, bs="mrf", k=20, xt=xt) + s(crime) + rooms + sales + type + logdriveshop, data=propertydata.spatial@data)
Which yields NaN errors.
I'm glad I was wrong about the mrf smoother. The paper I'd been reading did a comparison and found that Bayesian CAR usually had the best results, followed by GAMMs, with GAMs usually not doing as well as either. A GAMM is more flexible -- and I get to use brm
-- so I'm willing to use the nearly-as-good-but-second-best option instead of CARBayes
, if I can.
The problem occurs only if k
is set manually to a value smaller than the number of sites. I will investigate this further.
I realized that also gamm
stops with an error. The only function that works is gam
. From what I see I believe this is a problem in the contruction of the "random effects" parameterization of smoothing terms being used in gamm
, gamm4
, as well as brm
. In short I think that setting k
to a value lower than the number of sites (270 in this case) causes trouble for in the MRF smoother, but I am not sure whether this is a bug or not. I will ask Simon Wood about it.
If I try the gam
without k=20
, it gives an error:
Error in gam(logprice ~ s(IG, bs = "mrf", xt = xt) + s(crime) + rooms + :
Model has more coefficients than data
which I think makes sense: if the MRF smooth has as many EDF as rows of data to begin with, where is the slack to fit all of my other coefficients (including multiple knots for crime
)?
So I set k=20
and gam
is happy. I didn't think to not set it for gamm4
and brm
.
You can call brm
without setting k
but I am not sure how long it takes to fit the model / how well it converges.
Edit: Fitting the model with brms without setting k
worked completely fine and took less than 2 minutes on my machine.
I was trying that as I was writing. Just finished and using the brm
call with just the k=20
removed, and I had to crank adapt_delta
up 0.99 to get it down to 4 divergent transitions, which rook about 8 minutes to run on my laptop. Initial plots look okay, Rhat's okay, effective samples for sigma and MRF are only about 250. I'll try 0.999 and see if it works.
You could also try leaving adapt_delta at .99 and increase iter instead. 4 div transition are ok. Likely we just need more iterations / longer warmup.
Am 01.10.2016 17:31 schrieb "Wayne Folta" notifications@github.com:
I was trying that as I was writing. Just finished and using the brm call with just the k=20 removed, and I had to crank adapt_delta up 0.99 to get it down to 4 divergent transitions, which rook about 8 minutes to run on my laptop. Initial plots look okay, Rhat's okay, effective samples for sigma and MRF are only about 250. I'll try 0.999 and see if it works.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/128#issuecomment-250918408, or mute the thread https://github.com/notifications/unsubscribe-auth/AMVtANsPn9E1Gaxa0TVcjmMRgOYNEtLZks5qvnzqgaJpZM4KIWBK .
Thanks, will do. I used to figure if I got it down to a handful of divergent transitions I was okay. Then I watched Betancourt's (?) youtube video describing how Stan samples and walked away thinking even a single divergent transition is a sign that it's having trouble sampling part of the space and all bets may be off. (He didn't emphasize it that way, that's just my conclusion.)
Rstan recommends a pairs
plot to diagnose divergences. Unfortunately, I don't have access to a machine that's fast enough to complete a pairs plot. In practice, it's a huge lattice plot of dozens (or hundreds) of scatter plots and it's sloooooooow. (Some of this may be ggplot, but the one thing about R that I find frustrating is how it bogs down on plots with lots of points.) So, someday I'll succeed in printing a pairs
plot and then I'll learn how to interpret it.
Any alternative diagnostic you can think of would be a super-nice addition.
UPDATE: I ran 4000 iterations and got 16 divergent transitions. Thought the results might be interesting for you. (Seeing 5 coefficients makes me think that the ultimate MRF has 5 knots (or maybe intermediate knots, for a total of 6 or 7), which is obviously a lot fewer than 270. It does take a few extra seconds to print the Compiling Model message, which I assume is the smooth calculating furiously.)
Family: gaussian (identity)
Formula: logprice ~ s(IG, bs = "mrf", xt = xt) + s(crime) + rooms + sales + type + logdriveshop
Data: propertydata.spatial@data (Number of observations: 270)
Samples: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
total post-warmup samples = 8000
WAIC: Not computed
Spline Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sds(sIGbsEQ"mrf"xtEQxt_1) 0.04 0.01 0.03 0.05 491 1.01
sds(scrime_1) 0.38 0.36 0.02 1.35 1161 1.01
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept 4.08 0.13 3.84 4.33 4071 1.00
rooms 0.23 0.03 0.17 0.28 5078 1.00
sales 0.00 0.00 0.00 0.00 7007 1.00
typeflat -0.28 0.06 -0.39 -0.16 2790 1.00
typesemi -0.18 0.05 -0.28 -0.07 3860 1.00
typeterrace -0.31 0.06 -0.44 -0.19 3172 1.00
logdriveshop 0.00 0.03 -0.06 0.06 2862 1.00
sIG_1 -0.16 0.17 -0.50 0.18 8000 1.00
sIG_2 -0.33 0.17 -0.67 0.01 8000 1.00
sIG_3 0.07 0.17 -0.26 0.39 8000 1.00
sIG_4 -0.05 0.17 -0.37 0.29 8000 1.00
sIG_5 1.17 0.17 0.84 1.49 8000 1.00
scrime_1 -0.08 0.08 -0.28 0.03 960 1.01
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma 0.16 0.02 0.12 0.18 516 1.01
Also, when I plot the marginal effects of the MRF (log price v IG), it does it but also prints a warning message, T_SHOW_BACKTRACE environmental variable.
.
The number of "fixed effects" used for a smoothing term does not indicate the number of knots. I am not sure on what basis mgcv decides how many of them are required.
You could use argument pars
in pairs to select the parameters you want to be displayed. When selecting only few, making the pairs plot should only take a few seconds.
In my experience, a few divergent transition don't do too much harm, but the Stan team knows much better than I do of course. You could look at the diagnostic plots of shinystan by calling launch_shiny
on a brmsfit
object. Maybe @jgabry has some more suggestions.
On Saturday, October 1, 2016, Paul-Christian Bürkner < notifications@github.com> wrote:
You could use argument pars in pairs to select the parameters you want to be displayed. When selecting only few, making the pairs plot should only take a few seconds.
Yeah you essentially always have to select only a few parameters. I would include lp__ and then strategically select a few other parameters depending on what you want to look it. The efficiency of the pairs plot isn't related to ggplot though (I think that may have been suggested in a previous post somewhere). The rstan pairs plot is still done with base R plotting.
In my experience, a few divergent transition don't do too much harm, but the Stan team knows much better than I do of course. You could look at the diagnostic plots of shinystan by calling launch_shiny on a brmsfit object. Maybe @jgabry https://github.com/jgabry has some more suggestions.
In theory a single divergence is enough to threaten the validity of the estimates. But a single divergence, or a single digit number of divergences, can usually be dealt with by lowering the initial step size and increasing adapt_delta (sometimes as high as 0.999 or even higher for nasty ones). So if you have 4 after increasing adapt_delta I would think you could probably get rid of them by increasing further and maybe also setting the initial stepsize to something like 1/10 of the stepsize it ends up with now. If this model is for an important project then I wouldn't ignore the divergences. (In other cases when there are a lot of divergences those tricks aren't going to work and the model needs to be reparameterized.)
Like Paul says it's true that often you won't see much of a difference in the estimates when you get rid of a small number divergences. But there's absolutely no guarantee.
Jonah
I haven't looked at the model, but MRF and the like can often have difficult posteriors. Are these Gaussian MRFs? The parameterization of the covariance/precision matrices can also matter a lot.
On Saturday, October 1, 2016, Jonah Sol Gabry jgabry@gmail.com wrote:
On Saturday, October 1, 2016, Paul-Christian Bürkner < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:
You could use argument pars in pairs to select the parameters you want to be displayed. When selecting only few, making the pairs plot should only take a few seconds.
Yeah you essentially always have to select only a few parameters. I would include lp__ and then strategically select a few other parameters depending on what you want to look it. The efficiency of the pairs plot isn't related to ggplot though (I think that may have been suggested in a previous post somewhere). The rstan pairs plot is still done with base R plotting.
In my experience, a few divergent transition don't do too much harm, but the Stan team knows much better than I do of course. You could look at the diagnostic plots of shinystan by calling launch_shiny on a brmsfit object. Maybe @jgabry https://github.com/jgabry has some more suggestions.
In theory a single divergence is enough to threaten the validity of the estimates. But a single divergence, or a single digit number of divergences, can usually be dealt with by lowering the initial step size and increasing adapt_delta (sometimes as high as 0.999 or even higher for nasty ones). So if you have 4 after increasing adapt_delta I would think you could probably get rid of them by increasing further and maybe also setting the initial stepsize to something like 1/10 of the stepsize it ends up with now. If this model is for an important project then I wouldn't ignore the divergences. (In other cases when there are a lot of divergences those tricks aren't going to work and the model needs to be reparameterized.)
Like Paul says it's true that often you won't see much of a difference in the estimates when you get rid of a small number divergences. But there's absolutely no guarantee.
Jonah
I am not really sure what exactly happens inside mgcv for this smooth term, but it appears to be working ok in Stan.
I don't know what mgcv is doing either, but yeah Stan can definitely handle these models. Glad it seems to be working.
On Saturday, October 1, 2016, Paul-Christian Bürkner < notifications@github.com> wrote:
I am not really sure what exactly happens insde mgcv for this smooth term, but it appears to be working ok in Stan.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/128#issuecomment-250940114, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q4sj07ItqYK77-XcLcWc-VIQMyZxks5qvtd_gaJpZM4KIWBK .
@wfolta Since the initial parsing bug has been fixed and the problem when specifying k
occurs inside mgcv (I already informed Simon Wood about it), are you ok if we close this issue?
I think you can close it, though there are really three moving parts here:
k
: being investigated, but not fixed.k
or someday when that part's fixed), it's not clear that the models that use MRF will successfully converge. The hope is that it's mostly converging (with only a small number of divergences) now with unconstrained k
and will not have problems once we can turn k
down. But that's not guaranteed.The last two aren't closed. So maybe the best strategy is to open a different kind of issue for those two and close this one. (2) is sort-of a bug, though it affects other non-gam
functions so it's not a coding error on your part -- probably something Simon Wood's doing. (3) has to await (2) being clarified, and it may be that gam
's MRF creates things that are tough for Stan to handle. (Ultimately, you may need to use brm
's correlation methodology to create your own spatial options.)
I agree. The thing with (2) is that I can't do anything to fix it (other than informing Simon Wood, which I have already done) and once it is fixed, it will work in brms without any required changes.
Point (3) is tricky, because even when it doesn't work well in the end, it's hard to change it from my side. After all, brms just takes the smooth design matrices returned by mgcv and multplies them by the smoothing parameters (and adds an hierachical normal prior) using best practices for coding in Stan.
There is already an issue for implementing spatial correlation structures #6 and I just added a reminder to implement the CAR structure there.
Many thanks again for reporting this issue and for the detailed and interesting discussion about the topic.
Simon Wood responded saying that the error will be fixed in the next release of mgcv. I could already test it with the pre-release version and it seems to work correctly now. Using a lower k
indeed decreases the sampling time drastically and I get rid of all divergent transition when setting adapt_delta = 0.95
for the example discussed in this issue.
I'm trying to use the Markov Random Field smoother to do something geostatistical.
gamm
understands this smoother, whilegamm4
does not. It appears thatbrm
does, or at least wants to:Should this work? I've been doing some work with geospatial models and the gold standard is supposed to be CAR models, followed by GAMM models, but of course to work the GAMM needs to support MRF smoothing -- at least if we intend to do areal calculations rather than point calculations.
[Love the improvements, again, in
brms
! Excited that areal geospatial models might be possible.]