Open imadmali opened 7 years ago
@jgabry @bgoodri I'm almost done with these spatial models. One thing that I'm unsure about is K_smooth
in the priors_glm.stan
chunk. I'm not really sure about what this is doing. Is this relevant for the predictors in a spatial model?
Unlikely. The K_smooth is the number of columns in the matrix representing a smooth function in stan_gamm(), like s(income, age, by = gender, k = 5) would come out to a K_smooth of 50 or something.
On Thu, Aug 10, 2017 at 12:15 AM, Imad Ali notifications@github.com wrote:
@jgabry https://github.com/jgabry @bgoodri https://github.com/bgoodri I'm almost done with these spatial models. One thing that I'm unsure about is K_smooth in the priors_glm.stan chunk. I'm not really sure about what this is doing. Is this relevant for the predictors in a spatial model?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-321446001, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqu_8qcn38ZAbL2QDDBrt2uQoXIG9ks5sWoPegaJpZM4OaV0H .
Okay, in that case I'll break up the tparameters_glm.stan
and priors_glm.stan
files into smaller chunks/
so that I can use them in the spatial stan file, if that's okay with you guys.
Sure, give it a try.
On Thu, Aug 10, 2017 at 12:25 AM Imad Ali notifications@github.com wrote:
Okay, in that case I'll break up the tparameters_glm.stan and priors_glm.stan files into smaller chunks/ so that I can use them in the spatial stan file, if that's okay with you guys.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-321447245, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q8w2EnleKyzu6da0FDvpleV6Sykfks5sWoYcgaJpZM4OaV0H .
@bgoodri @jgabry currently the CAR spatial models are parameterized in terms of precision. So users are putting priors on precision and the parameters returned to the user are precision parameters. This is the way INLA does it (@dpsimpson can correct me if I'm wrong). Do we want to be consistent with INLA and stick with precision, or be consistent with the other modeling functions in rstanarm and use standard deviation?
USE STANDARD DEVIATION!!!!!
Priors on precision are hard to sense check. Inla/bugs is hard coded bad practice. We know better now.
Just make sure you do the scaling so the parameter is approximately interpretable as a stansaddd deviation.
D
On Sun, 13 Aug 2017 at 22:00, Imad Ali notifications@github.com wrote:
@bgoodri https://github.com/bgoodri @jgabry https://github.com/jgabry currently the CAR spatial models https://github.com/stan-dev/rstanarm/blob/feature/car/exec/spatial.stan#L124-L129 are parameterized in terms of precision. So users are putting priors on precision and the parameters returned to the user are precision parameters. This is the way INLA does it (@dpsimpson https://github.com/dpsimpson can correct me if I'm wrong). Do we want to be consistent with INLA and stick with precision, or be consistent with the other modeling functions in rstanarm and use standard deviation?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-322084131, or mute the thread https://github.com/notifications/unsubscribe-auth/AOKBBi0qcDDbKvaix464xhzLOrSdT37Lks5sX6pFgaJpZM4OaV0H .
Got it. @dpsimpson to be clear, based on the models here, the models in rstanarm here would then need to be the following,
if (model_type == 1) // besag
psi = phi * square(inv(tau));
else if (model_type == 2) // bym
psi = phi * square(rho[1]) + theta_raw * square(tau);
else if (model_type == 3) // bym2
psi = tau*(sqrt(1-rho[1])*theta_raw + sqrt(rho[1]/scaling_factor)*phi);
Is that correct?
@imamali, I agree with Dan. When you make that change also be careful about the default prior in case it's currently set to something that only makes sense for the precision.
On Mon, Aug 14, 2017 at 12:03 AM Imad Ali notifications@github.com wrote:
Got it. @dpsimpson https://github.com/dpsimpson to be clear, based on the models here https://arxiv.org/pdf/1601.01180.pdf, the models in rstanarm here https://github.com/stan-dev/rstanarm/blob/feature/car/exec/spatial.stan#L124-L129 would then need to be the following,
if (model_type == 1) // besag psi = phi square(inv(tau)); else if (model_type == 2) // bym psi = phi square(rho[1]) + theta_raw square(tau); else if (model_type == 3) // bym2 psi = tau(sqrt(1-rho[1])theta_raw + sqrt(rho[1]/scaling_factor)phi);
Is that correct?
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-322095243, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q7pMQdhwTbMGXsCAvMgG_NLFZc7vks5sX8cFgaJpZM4OaV0H .
btw, I think the Stan language has an inv_square function.
On Mon, Aug 14, 2017 at 1:11 AM Jonah Sol Gabry jgabry@gmail.com wrote:
@imamali, I agree with Dan. When you make that change also be careful about the default prior in case it's currently set to something that only makes sense for the precision.
On Mon, Aug 14, 2017 at 12:03 AM Imad Ali notifications@github.com wrote:
Got it. @dpsimpson https://github.com/dpsimpson to be clear, based on the models here https://arxiv.org/pdf/1601.01180.pdf, the models in rstanarm here https://github.com/stan-dev/rstanarm/blob/feature/car/exec/spatial.stan#L124-L129 would then need to be the following,
if (model_type == 1) // besag psi = phi square(inv(tau)); else if (model_type == 2) // bym psi = phi square(rho[1]) + theta_raw square(tau); else if (model_type == 3) // bym2 psi = tau(sqrt(1-rho[1])theta_raw + sqrt(rho[1]/scaling_factor)phi);
Is that correct?
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-322095243, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q7pMQdhwTbMGXsCAvMgG_NLFZc7vks5sX8cFgaJpZM4OaV0H .
@dpsimpson it looks like you're applying a constraint when you're computing the scaling factor,
inla.qinv(Q_pert, constr=list(A = matrix(1,1,nrow(Q)),e=0))
But I don't think the recursion that you suggested in the paper accounts for this. Since, for the first element in the recursion, we have,
Sigma[N,N] = 1/(L[N,N]^2)
Any thoughts on what to do about this? (I can't even use solve(Q_pert)
in the meantime while I sort out the recursion since the diagonals of that matchinla.qinv(Q_pert)
, and not the formula above.)
It's section 2.2 of the paper.
D
Sent from my iPad
On 14 Aug 2017, at 22:32, Imad Ali notifications@github.com wrote:
@dpsimpson it looks like you're applying a constraint when you're computing the scaling factor, inla.qinv(Q_pert, constr=list(A = matrix(1,1,nrow(Q)),e=0))
But I don't think the recursion that you suggested in the paper accounts for this. Since, for the first element in the recursion, we have, Sigma[N,N] = 1/(L[N,N]^2)
Any thoughts on what to do about this? (I can't even use solve(Q_pert) in the meantime while I sort out the recursion since the diagonals of that matchinla.qinv(Q_pert), and not the formula above.)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@dpsimpson thanks. For now I'm using solve
with the constraint in section 2.2 (which matches the scaling factor you compute via inla.qinv
) until I can work out what's going on with the recursions I wrote.
@imadmali Is this close enough to make a pull request? The eventual merge is going to be harder if we have to wait until after the next rstanarm release. If it is not quite ready but compiles, it is possible that we could merge stuff but not expose the stan_
functions to users yet.
@bgoodri, yeah I think it's ready for a pull request. The modeling functions work but they need to be tested properly. I'll try to submit a pull request tonight/tomorrow. When are you aiming to release?
As soon as Sean releases Stan Math 2.17.1
On Mon, Nov 13, 2017 at 10:05 AM, Imad Ali notifications@github.com wrote:
@bgoodri https://github.com/bgoodri, yeah I think it's ready for a pull request. The modeling functions work but they need to be tested properly. I'll try to submit a pull request tonight/tomorrow. When are you aiming to release?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-343946568, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqntZmaJtVZl9ACu1qt5N-FPjJ65kks5s2FqegaJpZM4OaV0H .
where are things at with this feature? I've worked out a BYM2 model for regions which have disconnected components and/or islands which depend on running R helper functions. It's rather complicated - the relevant files are checked into the ICAR case study:
Imad's version, which is a bit more in the unreadable rstanarm style is on this branch
https://github.com/stan-dev/rstanarm/blob/feature/car/exec/spatial.stan
On Mon, Nov 27, 2017 at 3:24 PM, Mitzi Morris notifications@github.com wrote:
where are things at with this feature? I've worked out a BYM2 model for regions which have disconnected components and/or islands which depend on running R helper functions. It's rather complicated - the relevant files are checked into the ICAR case study:
- https://github.com/stan-dev/example-models/blob/master/ knitr/car-iar-poisson/bym2.stan https://github.com/stan-dev/example-models/blob/master/knitr/car-iar-poisson/bym2.stan
- https://github.com/stan-dev/example-models/blob/master/ knitr/car-iar-poisson/bym2_islands.stan https://github.com/stan-dev/example-models/blob/master/knitr/car-iar-poisson/bym2_islands.stan
- https://github.com/stan-dev/example-models/blob/master/ knitr/car-iar-poisson/nb_data_funs.R https://github.com/stan-dev/example-models/blob/master/knitr/car-iar-poisson/nb_data_funs.R
- https://github.com/stan-dev/example-models/blob/master/ knitr/car-iar-poisson/fit_bym2_connected.R https://github.com/stan-dev/example-models/blob/master/knitr/car-iar-poisson/fit_bym2_connected.R
- https://github.com/stan-dev/example-models/blob/master/ knitr/car-iar-poisson/fit_bym2_islands.R https://github.com/stan-dev/example-models/blob/master/knitr/car-iar-poisson/fit_bym2_islands.R
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-347316234, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqiMBBJZB_JHC1tYSfL3eo2_BgNBKks5s6xp-gaJpZM4OaV0H .
According to @imadmali, we are waiting for @dpsimpson to verify some numerical values are close to those produced by INLA. It seems as if anyone with INLA installed should be able to do that. CC @mitzimorris .
Currently I'm sorting out merge conflicts since the structure of the package is a bit different compared to when I last merged in from master. Once I've done that I can add the disconnected graph BYM2 stuff that @mitzimorris showed me. I sent @dpsimpson some INLA code a while back to make sure I'm matching spatial models correctly between INLA and rstanarm.
@dpsimpson needs to check his emails apparently...
On Sun, 14 Jan 2018 at 20:47, Imad Ali notifications@github.com wrote:
Currently I'm sorting out merge conflicts since the structure of the package is a bit different compared to when I last merged in from master. Once I've done that I can add the disconnected graph BYM2 stuff that @mitzimorris https://github.com/mitzimorris showed me. I sent @dpsimpson https://github.com/dpsimpson some INLA code a while back to make sure I'm matching spatial models correctly between INLA and rstanarm.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-357563059, or mute the thread https://github.com/notifications/unsubscribe-auth/AOKBBoYqJFnFVxDDbPhHiIpnXMLJnMiRks5tKq4mgaJpZM4OaV0H .
@imadmali Basically, just move your main .stan files to the root of src/stan_files, put any chunks into the appropriate subfolder of src/stan_files/, and refer to them using the official Stan mechanism (#include folder/file.stan
flush-left with nothing, not even whitespace, after the filename). The rest is the same I think.
@bgoodri @jgabry I believe the feature/car branch as of 2da6c77 a stable state (models fit and methods work). The only major thing that's left is adding a fourth model that handles disconnected graphs. With regard to unit tests, since we are not actually emulating an R package, I'm thinking it might be best to just use stand alone stan files that we know produce the right parameters (@dpsimpson can vet them), and feed these into the appropriate unit tests.
OK, we can do the disconnected graph thing later.
On Sun, Jan 21, 2018 at 7:44 PM, Imad Ali notifications@github.com wrote:
@bgoodri https://github.com/bgoodri @jgabry https://github.com/jgabry I believe the feature/car branch as of 2da6c77 https://github.com/stan-dev/rstanarm/commit/2da6c7726c0fced98ad6da8e97be686791845d26 a stable state (models fit and methods work). The only major thing that's left is adding a fourth model that handles disconnected graphs.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-359297208, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqsxRmuF4ey2dhURR5DRpmKpBXRpoks5tM9nrgaJpZM4OaV0H .
been comparing how to set sum-to-zero constraint on ICAR component - using a prior to tightly constrain mean(phi) to 0 is much faster than hard sum-to-zero constraint and gets the same answers for the data I was testing on - on a model w/ just ICAR component it's twice as fast.
here's the soft-center version of ICAR:
data {
int<lower=0> N;
int<lower=0> N_edges;
int<lower=1, upper=N> node1[N_edges]; // node1[i] adjacent to node2[i]
int<lower=1, upper=N> node2[N_edges]; // and node1[i] < node2[i]
int<lower=0> y[N]; // count outcomes
vector<lower=0>[N] E; // exposure
}
transformed data {
vector[N] log_E = log(E);
}
parameters {
real alpha; // intercept
real<lower=0> sigma; // overall standard deviation
vector[N] phi; // spatial effects
}
model {
y ~ poisson_log(log_E + alpha + phi * sigma);
target += -0.5 * dot_self(phi[node1] - phi[node2]);
alpha ~ normal(0.0, 2.5);
sigma ~ normal(0.0, 5.0);
// soft sum-to-zero constraint on phi)
sum(phi) ~ normal(0, 0.001 * N); // equivalent to mean(phi) ~ normal(0,0.001)
}
(post edited to remove earlier misstatement about this version doing less work - it's not doing less work - it's just fitting better. have now tested on oblig Scotland dataset as well as NYC dataset for 2000 areal units).
Can't we just take alpha
out and let phi
have whatever center is
consistent with the data?
On Sun, Jan 21, 2018 at 8:06 PM, Mitzi Morris notifications@github.com wrote:
been comparing how to set sum-to-zero constraint on ICAR component - using a prior to tightly constrain mean(phi) to 0 is much faster than hard sum-to-zero constraint and gets the same answers for the data I was testing on - on a model w/ just ICAR component it's twice as fast. is this due to the fact that hard sum-to-zero constraint creates another whole vector of variables so you've got twice as many things to compute at each step?
here's the soft-center version of ICAR:
data { int
N; int N_edges; int node1[N_edges]; // node1[i] adjacent to node2[i] int node2[N_edges]; // and node1[i] < node2[i] int
y[N]; // count outcomes vector [N] E; // exposure } transformed data { vector[N] log_E = log(E); } parameters { real alpha; // intercept real sigma; // overall standard deviation vector[N] phi; // spatial effects } model { y ~ poisson_log(log_E + alpha + phi sigma); target += -0.5 dot_self(phi[node1] - phi[node2]); alpha ~ normal(0.0, 2.5); sigma ~ normal(0.0, 5.0); // soft sum-to-zero constraint on phi) sum(phi) ~ normal(0, 0.001 * N); // equivalent to mean(phi) ~ normal(0,0.001) }
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-359299041, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqsTZ-1EAMO1fJnDzr7xTNeoAxQEPks5tM975gaJpZM4OaV0H .
alpha
is supposed to be interpreted as the constant in the linear predictor and phi
is supposed to pick up the spatial effect. If we drop alpha
then we'll be preventing users from specifying a constant in their model.
I would be okay with preventing users from introducing a constant if it
makes the sampling performance acceptable for a wider set of inputs. Or
even just defining alpha
in generated quantities as the mean of phi
and
then redefining phi
as the deviation from alpha
.
On Sun, Jan 21, 2018 at 8:14 PM, Imad Ali notifications@github.com wrote:
alpha is supposed to be interpreted as the constant in the linear predictor and phi is supposed to pick up the spatial effect. If we drop alpha then we'll be preventing users from specifying a constant in their model.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-359299867, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqtbw9fbf5GdTfVDU3GD68qEJ6CrVks5tM-DxgaJpZM4OaV0H .
Anyone know if this is ready for a PR?
I'm picking this up again. Sorry for the long hiatus. Going over the code it looks like the following link parings (below) fail. The unit tests only check to see if the modeling functions and appropriate methods work. loo
returns and error since I think we still need to sort out what loo is for the CAR-type models.
Failing family/link pairings
stan_besag
binomial(link = "log")
possion(link = "identity")
neg_binomial_2(link = "identity")
Gamma(link = "identity")
Gamma(link = "inverse")
stan_bym
binomial(link = "log")
possion(link = "identity")
neg_binomial_2(link = "identity")
Gamma(link = "identity")
Gamma(link = "inverse")
stan_bym2
binomial(link = "log")
poisson(link = "identity")
neg_binomial_2(link = "identity")
Gamma(link = "identity")
Those family / link combos are problematic because not all realizations of the ordinary linear predictor yield a valid conditional mean. For the other Stan programs, we had to do shit like subtracting the minimum or maximum from the initial eta and then adding on the intercept which was restricted to be positive.
On Wed, Sep 5, 2018 at 5:36 PM Imad Ali notifications@github.com wrote:
I'm picking this up again. Sorry for the long hiatus. Going over the code it looks like the following link parings (below) fail. The unit tests only check to see if the modeling functions and appropriate methods work. loo returns and error since I think we still need to sort out what loo is for the CAR-type models.
Failing family/link pairings
- stan_besag
- possion(link = "identity")
- neg_binomial_2(link = "identity")
- Gamma(link = "identity")
- Gamma(link = "inverse")
- stan_bym
- binomial(link= = "log")
- possion(link = "identity")
- neg_binomial_2(link = "identity")
- Gamma(link = "identity")
- Gamma(link = "inverse")
- stan_bym2
- binomial(link = "log")
- poisson(link = "identity")
- neg_binomial_2(link = "identity")
- Gamma(link = "identity")
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-418889802, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqpetgs-zotIzMszogTnH_hrJCFELks5uYEPygaJpZM4OaV0H .
Oh yeah, I think I remember having to do something similar for beta regression.
Also, @bgoodri how do I get the latest version of StanHeaders (2.18.0)? Do I have to clone the rstan repo and build locally?
Yeah
On Thu, Sep 6, 2018 at 12:08 AM Imad Ali notifications@github.com wrote:
Also, @bgoodri https://github.com/bgoodri how do I get the latest version of StanHeaders (2.18.0)? Do I have to clone the rstan repo and build locally?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-418958287, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqi70E4uQUhjvIw9-jdT6FPzCSZN6ks5uYJ_EgaJpZM4OaV0H .
I went through the first few lines of the install_StanHeaders.R
file and got the following error:
Any idea what's going on?
@bgoodri for unit tests do I need to do every family-link combo or should I just do a single link function for each family (since all the combos are being tested by other unit tests and I'm reusing the stan family-link functions).
I would test that the likelihood is correct at a point (such as a MLE) for every family-link combo from test_stan_functions.R but I would only do MCMC / optimization / VB for as little as necessary. It takes rstanarm so long to install on Windows already that we do not have time for much testing.
On Sun, Oct 21, 2018 at 8:11 PM Imad Ali notifications@github.com wrote:
@bgoodri https://github.com/bgoodri for unit tests do I need to do every family-link combo or should I just do a single link function for each family (since all the combos are being tested by other unit tests and I'm reusing the stan family-link functions).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-431716208, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqlD4PjejBMdtxjwhKqTFw5QmiFp6ks5unQ0_gaJpZM4OaV0H .
Ok. I'm not using any new likelihood functions and they seem to all be tested in test_stan_functions.R
.
Sounds good
On Mon, Oct 22, 2018 at 2:38 PM Imad Ali notifications@github.com wrote:
Ok. I'm not using any new likelihood functions and they seem to all be tested in test_stan_functions.R.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-431929403, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqr3cpsdJ1IdwXFyInq14gmnaX70Rks5unhCOgaJpZM4OaV0H .
Starting an issue to track the feature/car branch which introduces the conditional autoregressive (CAR) class of spatial models into rstanarm.
Currently it looks like,
stan_besag.R
is the intrinsic conditional autoregressive (ICAR) model (i.e. just a GMRF with a sum-to-zero constraint.)stan_bym.R
is a CAR model which involves a convolution between the GMRF and noise.Stuff to implement for the models above,
scaling_factor
instan_bym
withINLA::inla.qinv
)