stan-dev / rstanarm

rstanarm R package for Bayesian applied regression modeling
https://mc-stan.org/rstanarm
GNU General Public License v3.0
383 stars 132 forks source link

feature/car #207

Open imadmali opened 7 years ago

imadmali commented 7 years ago

Starting an issue to track the feature/car branch which introduces the conditional autoregressive (CAR) class of spatial models into rstanarm.

Currently it looks like,

            ┌─── stan_besag.R ────┐
user ───────┤─── stan_bym.R  ─────├──── stan_spatial.fit.R ──── spatial.stan
            └─── stan_bym2.R ─────┘

Stuff to implement for the models above,

imadmali commented 7 years ago

@jgabry @bgoodri I'm almost done with these spatial models. One thing that I'm unsure about is K_smooth in the priors_glm.stan chunk. I'm not really sure about what this is doing. Is this relevant for the predictors in a spatial model?

bgoodri commented 7 years ago

Unlikely. The K_smooth is the number of columns in the matrix representing a smooth function in stan_gamm(), like s(income, age, by = gender, k = 5) would come out to a K_smooth of 50 or something.

On Thu, Aug 10, 2017 at 12:15 AM, Imad Ali notifications@github.com wrote:

@jgabry https://github.com/jgabry @bgoodri https://github.com/bgoodri I'm almost done with these spatial models. One thing that I'm unsure about is K_smooth in the priors_glm.stan chunk. I'm not really sure about what this is doing. Is this relevant for the predictors in a spatial model?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-321446001, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqu_8qcn38ZAbL2QDDBrt2uQoXIG9ks5sWoPegaJpZM4OaV0H .

imadmali commented 7 years ago

Okay, in that case I'll break up the tparameters_glm.stan and priors_glm.stan files into smaller chunks/ so that I can use them in the spatial stan file, if that's okay with you guys.

jgabry commented 7 years ago

Sure, give it a try.

On Thu, Aug 10, 2017 at 12:25 AM Imad Ali notifications@github.com wrote:

Okay, in that case I'll break up the tparameters_glm.stan and priors_glm.stan files into smaller chunks/ so that I can use them in the spatial stan file, if that's okay with you guys.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-321447245, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q8w2EnleKyzu6da0FDvpleV6Sykfks5sWoYcgaJpZM4OaV0H .

imadmali commented 7 years ago

@bgoodri @jgabry currently the CAR spatial models are parameterized in terms of precision. So users are putting priors on precision and the parameters returned to the user are precision parameters. This is the way INLA does it (@dpsimpson can correct me if I'm wrong). Do we want to be consistent with INLA and stick with precision, or be consistent with the other modeling functions in rstanarm and use standard deviation?

dpsimpson commented 7 years ago

USE STANDARD DEVIATION!!!!!

Priors on precision are hard to sense check. Inla/bugs is hard coded bad practice. We know better now.

Just make sure you do the scaling so the parameter is approximately interpretable as a stansaddd deviation.

D

On Sun, 13 Aug 2017 at 22:00, Imad Ali notifications@github.com wrote:

@bgoodri https://github.com/bgoodri @jgabry https://github.com/jgabry currently the CAR spatial models https://github.com/stan-dev/rstanarm/blob/feature/car/exec/spatial.stan#L124-L129 are parameterized in terms of precision. So users are putting priors on precision and the parameters returned to the user are precision parameters. This is the way INLA does it (@dpsimpson https://github.com/dpsimpson can correct me if I'm wrong). Do we want to be consistent with INLA and stick with precision, or be consistent with the other modeling functions in rstanarm and use standard deviation?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-322084131, or mute the thread https://github.com/notifications/unsubscribe-auth/AOKBBi0qcDDbKvaix464xhzLOrSdT37Lks5sX6pFgaJpZM4OaV0H .

imadmali commented 7 years ago

Got it. @dpsimpson to be clear, based on the models here, the models in rstanarm here would then need to be the following,

  if (model_type == 1)         // besag
     psi = phi * square(inv(tau));
  else if (model_type == 2)    // bym
     psi = phi * square(rho[1]) + theta_raw * square(tau);
  else if (model_type == 3)    // bym2
     psi = tau*(sqrt(1-rho[1])*theta_raw + sqrt(rho[1]/scaling_factor)*phi);

Is that correct?

jgabry commented 7 years ago

@imamali, I agree with Dan. When you make that change also be careful about the default prior in case it's currently set to something that only makes sense for the precision.

On Mon, Aug 14, 2017 at 12:03 AM Imad Ali notifications@github.com wrote:

Got it. @dpsimpson https://github.com/dpsimpson to be clear, based on the models here https://arxiv.org/pdf/1601.01180.pdf, the models in rstanarm here https://github.com/stan-dev/rstanarm/blob/feature/car/exec/spatial.stan#L124-L129 would then need to be the following,

if (model_type == 1) // besag psi = phi square(inv(tau)); else if (model_type == 2) // bym psi = phi square(rho[1]) + theta_raw square(tau); else if (model_type == 3) // bym2 psi = tau(sqrt(1-rho[1])theta_raw + sqrt(rho[1]/scaling_factor)phi);

Is that correct?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-322095243, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q7pMQdhwTbMGXsCAvMgG_NLFZc7vks5sX8cFgaJpZM4OaV0H .

jgabry commented 7 years ago

btw, I think the Stan language has an inv_square function.

On Mon, Aug 14, 2017 at 1:11 AM Jonah Sol Gabry jgabry@gmail.com wrote:

@imamali, I agree with Dan. When you make that change also be careful about the default prior in case it's currently set to something that only makes sense for the precision.

On Mon, Aug 14, 2017 at 12:03 AM Imad Ali notifications@github.com wrote:

Got it. @dpsimpson https://github.com/dpsimpson to be clear, based on the models here https://arxiv.org/pdf/1601.01180.pdf, the models in rstanarm here https://github.com/stan-dev/rstanarm/blob/feature/car/exec/spatial.stan#L124-L129 would then need to be the following,

if (model_type == 1) // besag psi = phi square(inv(tau)); else if (model_type == 2) // bym psi = phi square(rho[1]) + theta_raw square(tau); else if (model_type == 3) // bym2 psi = tau(sqrt(1-rho[1])theta_raw + sqrt(rho[1]/scaling_factor)phi);

Is that correct?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-322095243, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q7pMQdhwTbMGXsCAvMgG_NLFZc7vks5sX8cFgaJpZM4OaV0H .

imadmali commented 7 years ago

@dpsimpson it looks like you're applying a constraint when you're computing the scaling factor, inla.qinv(Q_pert, constr=list(A = matrix(1,1,nrow(Q)),e=0))

But I don't think the recursion that you suggested in the paper accounts for this. Since, for the first element in the recursion, we have, Sigma[N,N] = 1/(L[N,N]^2)

Any thoughts on what to do about this? (I can't even use solve(Q_pert) in the meantime while I sort out the recursion since the diagonals of that matchinla.qinv(Q_pert), and not the formula above.)

dpsimpson commented 7 years ago

It's section 2.2 of the paper.

D

Sent from my iPad

On 14 Aug 2017, at 22:32, Imad Ali notifications@github.com wrote:

@dpsimpson it looks like you're applying a constraint when you're computing the scaling factor, inla.qinv(Q_pert, constr=list(A = matrix(1,1,nrow(Q)),e=0))

But I don't think the recursion that you suggested in the paper accounts for this. Since, for the first element in the recursion, we have, Sigma[N,N] = 1/(L[N,N]^2)

Any thoughts on what to do about this? (I can't even use solve(Q_pert) in the meantime while I sort out the recursion since the diagonals of that matchinla.qinv(Q_pert), and not the formula above.)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

imadmali commented 7 years ago

@dpsimpson thanks. For now I'm using solve with the constraint in section 2.2 (which matches the scaling factor you compute via inla.qinv) until I can work out what's going on with the recursions I wrote.

bgoodri commented 6 years ago

@imadmali Is this close enough to make a pull request? The eventual merge is going to be harder if we have to wait until after the next rstanarm release. If it is not quite ready but compiles, it is possible that we could merge stuff but not expose the stan_ functions to users yet.

imadmali commented 6 years ago

@bgoodri, yeah I think it's ready for a pull request. The modeling functions work but they need to be tested properly. I'll try to submit a pull request tonight/tomorrow. When are you aiming to release?

bgoodri commented 6 years ago

As soon as Sean releases Stan Math 2.17.1

On Mon, Nov 13, 2017 at 10:05 AM, Imad Ali notifications@github.com wrote:

@bgoodri https://github.com/bgoodri, yeah I think it's ready for a pull request. The modeling functions work but they need to be tested properly. I'll try to submit a pull request tonight/tomorrow. When are you aiming to release?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-343946568, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqntZmaJtVZl9ACu1qt5N-FPjJ65kks5s2FqegaJpZM4OaV0H .

mitzimorris commented 6 years ago

where are things at with this feature? I've worked out a BYM2 model for regions which have disconnected components and/or islands which depend on running R helper functions. It's rather complicated - the relevant files are checked into the ICAR case study:

bgoodri commented 6 years ago

Imad's version, which is a bit more in the unreadable rstanarm style is on this branch

https://github.com/stan-dev/rstanarm/blob/feature/car/exec/spatial.stan

On Mon, Nov 27, 2017 at 3:24 PM, Mitzi Morris notifications@github.com wrote:

where are things at with this feature? I've worked out a BYM2 model for regions which have disconnected components and/or islands which depend on running R helper functions. It's rather complicated - the relevant files are checked into the ICAR case study:

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-347316234, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqiMBBJZB_JHC1tYSfL3eo2_BgNBKks5s6xp-gaJpZM4OaV0H .

bgoodri commented 6 years ago

According to @imadmali, we are waiting for @dpsimpson to verify some numerical values are close to those produced by INLA. It seems as if anyone with INLA installed should be able to do that. CC @mitzimorris .

imadmali commented 6 years ago

Currently I'm sorting out merge conflicts since the structure of the package is a bit different compared to when I last merged in from master. Once I've done that I can add the disconnected graph BYM2 stuff that @mitzimorris showed me. I sent @dpsimpson some INLA code a while back to make sure I'm matching spatial models correctly between INLA and rstanarm.

dpsimpson commented 6 years ago

@dpsimpson needs to check his emails apparently...

On Sun, 14 Jan 2018 at 20:47, Imad Ali notifications@github.com wrote:

Currently I'm sorting out merge conflicts since the structure of the package is a bit different compared to when I last merged in from master. Once I've done that I can add the disconnected graph BYM2 stuff that @mitzimorris https://github.com/mitzimorris showed me. I sent @dpsimpson https://github.com/dpsimpson some INLA code a while back to make sure I'm matching spatial models correctly between INLA and rstanarm.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-357563059, or mute the thread https://github.com/notifications/unsubscribe-auth/AOKBBoYqJFnFVxDDbPhHiIpnXMLJnMiRks5tKq4mgaJpZM4OaV0H .

bgoodri commented 6 years ago

@imadmali Basically, just move your main .stan files to the root of src/stan_files, put any chunks into the appropriate subfolder of src/stan_files/, and refer to them using the official Stan mechanism (#include folder/file.stan flush-left with nothing, not even whitespace, after the filename). The rest is the same I think.

imadmali commented 6 years ago

@bgoodri @jgabry I believe the feature/car branch as of 2da6c77 a stable state (models fit and methods work). The only major thing that's left is adding a fourth model that handles disconnected graphs. With regard to unit tests, since we are not actually emulating an R package, I'm thinking it might be best to just use stand alone stan files that we know produce the right parameters (@dpsimpson can vet them), and feed these into the appropriate unit tests.

bgoodri commented 6 years ago

OK, we can do the disconnected graph thing later.

On Sun, Jan 21, 2018 at 7:44 PM, Imad Ali notifications@github.com wrote:

@bgoodri https://github.com/bgoodri @jgabry https://github.com/jgabry I believe the feature/car branch as of 2da6c77 https://github.com/stan-dev/rstanarm/commit/2da6c7726c0fced98ad6da8e97be686791845d26 a stable state (models fit and methods work). The only major thing that's left is adding a fourth model that handles disconnected graphs.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-359297208, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqsxRmuF4ey2dhURR5DRpmKpBXRpoks5tM9nrgaJpZM4OaV0H .

mitzimorris commented 6 years ago

been comparing how to set sum-to-zero constraint on ICAR component - using a prior to tightly constrain mean(phi) to 0 is much faster than hard sum-to-zero constraint and gets the same answers for the data I was testing on - on a model w/ just ICAR component it's twice as fast.

here's the soft-center version of ICAR:

data {
  int<lower=0> N;
  int<lower=0> N_edges;
  int<lower=1, upper=N> node1[N_edges];  // node1[i] adjacent to node2[i]
  int<lower=1, upper=N> node2[N_edges];  // and node1[i] < node2[i]

  int<lower=0> y[N];              // count outcomes
  vector<lower=0>[N] E;           // exposure
}
transformed data {
  vector[N] log_E = log(E);
}
parameters {
  real alpha;             // intercept
  real<lower=0> sigma;    // overall standard deviation
  vector[N] phi;  // spatial effects
}
model {
  y ~ poisson_log(log_E + alpha + phi * sigma);
  target += -0.5 * dot_self(phi[node1] - phi[node2]);
  alpha ~ normal(0.0, 2.5);
  sigma ~ normal(0.0, 5.0);

  // soft sum-to-zero constraint on phi)
  sum(phi) ~ normal(0, 0.001 * N);  // equivalent to mean(phi) ~ normal(0,0.001)
}

(post edited to remove earlier misstatement about this version doing less work - it's not doing less work - it's just fitting better. have now tested on oblig Scotland dataset as well as NYC dataset for 2000 areal units).

bgoodri commented 6 years ago

Can't we just take alpha out and let phi have whatever center is consistent with the data?

On Sun, Jan 21, 2018 at 8:06 PM, Mitzi Morris notifications@github.com wrote:

been comparing how to set sum-to-zero constraint on ICAR component - using a prior to tightly constrain mean(phi) to 0 is much faster than hard sum-to-zero constraint and gets the same answers for the data I was testing on - on a model w/ just ICAR component it's twice as fast. is this due to the fact that hard sum-to-zero constraint creates another whole vector of variables so you've got twice as many things to compute at each step?

here's the soft-center version of ICAR:

data { int N; int N_edges; int node1[N_edges]; // node1[i] adjacent to node2[i] int node2[N_edges]; // and node1[i] < node2[i]

int y[N]; // count outcomes vector[N] E; // exposure } transformed data { vector[N] log_E = log(E); } parameters { real alpha; // intercept real sigma; // overall standard deviation vector[N] phi; // spatial effects } model { y ~ poisson_log(log_E + alpha + phi sigma); target += -0.5 dot_self(phi[node1] - phi[node2]); alpha ~ normal(0.0, 2.5); sigma ~ normal(0.0, 5.0);

// soft sum-to-zero constraint on phi) sum(phi) ~ normal(0, 0.001 * N); // equivalent to mean(phi) ~ normal(0,0.001) }

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-359299041, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqsTZ-1EAMO1fJnDzr7xTNeoAxQEPks5tM975gaJpZM4OaV0H .

imadmali commented 6 years ago

alpha is supposed to be interpreted as the constant in the linear predictor and phi is supposed to pick up the spatial effect. If we drop alpha then we'll be preventing users from specifying a constant in their model.

bgoodri commented 6 years ago

I would be okay with preventing users from introducing a constant if it makes the sampling performance acceptable for a wider set of inputs. Or even just defining alpha in generated quantities as the mean of phi and then redefining phi as the deviation from alpha.

On Sun, Jan 21, 2018 at 8:14 PM, Imad Ali notifications@github.com wrote:

alpha is supposed to be interpreted as the constant in the linear predictor and phi is supposed to pick up the spatial effect. If we drop alpha then we'll be preventing users from specifying a constant in their model.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-359299867, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqtbw9fbf5GdTfVDU3GD68qEJ6CrVks5tM-DxgaJpZM4OaV0H .

jgabry commented 6 years ago

Anyone know if this is ready for a PR?

imadmali commented 6 years ago

I'm picking this up again. Sorry for the long hiatus. Going over the code it looks like the following link parings (below) fail. The unit tests only check to see if the modeling functions and appropriate methods work. loo returns and error since I think we still need to sort out what loo is for the CAR-type models.

Failing family/link pairings

bgoodri commented 6 years ago

Those family / link combos are problematic because not all realizations of the ordinary linear predictor yield a valid conditional mean. For the other Stan programs, we had to do shit like subtracting the minimum or maximum from the initial eta and then adding on the intercept which was restricted to be positive.

On Wed, Sep 5, 2018 at 5:36 PM Imad Ali notifications@github.com wrote:

I'm picking this up again. Sorry for the long hiatus. Going over the code it looks like the following link parings (below) fail. The unit tests only check to see if the modeling functions and appropriate methods work. loo returns and error since I think we still need to sort out what loo is for the CAR-type models.

Failing family/link pairings

  • stan_besag
    • possion(link = "identity")
    • neg_binomial_2(link = "identity")
    • Gamma(link = "identity")
    • Gamma(link = "inverse")
  • stan_bym
    • binomial(link= = "log")
    • possion(link = "identity")
    • neg_binomial_2(link = "identity")
    • Gamma(link = "identity")
    • Gamma(link = "inverse")
  • stan_bym2
    • binomial(link = "log")
    • poisson(link = "identity")
    • neg_binomial_2(link = "identity")
    • Gamma(link = "identity")

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-418889802, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqpetgs-zotIzMszogTnH_hrJCFELks5uYEPygaJpZM4OaV0H .

imadmali commented 6 years ago

Oh yeah, I think I remember having to do something similar for beta regression.

imadmali commented 6 years ago

Also, @bgoodri how do I get the latest version of StanHeaders (2.18.0)? Do I have to clone the rstan repo and build locally?

bgoodri commented 6 years ago

Yeah

On Thu, Sep 6, 2018 at 12:08 AM Imad Ali notifications@github.com wrote:

Also, @bgoodri https://github.com/bgoodri how do I get the latest version of StanHeaders (2.18.0)? Do I have to clone the rstan repo and build locally?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-418958287, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqi70E4uQUhjvIw9-jdT6FPzCSZN6ks5uYJ_EgaJpZM4OaV0H .

imadmali commented 6 years ago

I went through the first few lines of the install_StanHeaders.R file and got the following error:

image

Any idea what's going on?

imadmali commented 5 years ago

@bgoodri for unit tests do I need to do every family-link combo or should I just do a single link function for each family (since all the combos are being tested by other unit tests and I'm reusing the stan family-link functions).

bgoodri commented 5 years ago

I would test that the likelihood is correct at a point (such as a MLE) for every family-link combo from test_stan_functions.R but I would only do MCMC / optimization / VB for as little as necessary. It takes rstanarm so long to install on Windows already that we do not have time for much testing.

On Sun, Oct 21, 2018 at 8:11 PM Imad Ali notifications@github.com wrote:

@bgoodri https://github.com/bgoodri for unit tests do I need to do every family-link combo or should I just do a single link function for each family (since all the combos are being tested by other unit tests and I'm reusing the stan family-link functions).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-431716208, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqlD4PjejBMdtxjwhKqTFw5QmiFp6ks5unQ0_gaJpZM4OaV0H .

imadmali commented 5 years ago

Ok. I'm not using any new likelihood functions and they seem to all be tested in test_stan_functions.R.

bgoodri commented 5 years ago

Sounds good

On Mon, Oct 22, 2018 at 2:38 PM Imad Ali notifications@github.com wrote:

Ok. I'm not using any new likelihood functions and they seem to all be tested in test_stan_functions.R.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/207#issuecomment-431929403, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqr3cpsdJ1IdwXFyInq14gmnaX70Rks5unhCOgaJpZM4OaV0H .