WIP: add Polya Gamma sampler

paciorek commented 6 months ago

Do not merge. This is for review. I need to add testing (both of sampling correctness and of the checking that model functional form is ok for PG).

This PG sampler allows for:

zero-inflation (needed for occupancy models)
dnorm or dmnorm target nodes
dbern or dbin response nodes
stochastic elements in the design matrix
stochastic and/or varying binomial size values in the response

This takes the original sampler code from @paul-vdb and reworks some of the checking to try to catch more cases where PG is not appropriate as well as trying to improve efficiency.

In terms of checking, we use some of our conjugacy checking code to check that the response probabilities are the product of a probability times one or more zero-inflation binary values and that the probability is logit-linked to a linear predictor of the target elements.

We manually fill in the X (design) matrix (in similar fashion to how we handle conjugacy in regression contexts) if not provided by the user but only do this once of the user tells us X is fixed (i.e., not missing values in X and no stochastic indexing in terms of which row of X is relevant for a given response.

@paul-vdb would be great if you could take a careful look, but I understand you may have time limitations.

@perrydv @danielturek @kenkellner just flagging this for you if of interest.

paul-vdb commented 6 months ago

@paciorek Had a peak and it's looking overall great. I'll make some time tomorrow and will give it a full check and play around with some examples. Great work getting it all cleaned up.

paciorek commented 6 months ago

Good, thanks @paul-vdb . If you can pass your examples on to me once you're done, they may help me in setting up tests.

paul-vdb commented 6 months ago

@paciorek I started checking this through. Looks like you started changing function names from 'get' to 'set' (better name!) but hadn't completed that. I pushed those changes.

I'll look through more thoroughly tomorrow. I've written an N-Mixture model to test the stochastic binomial size. I see in the sandbox we have tests for a multivariate normal beta, standard logistic regression, zero-inflation through occupancy. Have you run this with the MSOM as blocks as previously discussed on the other thread?

paciorek commented 6 months ago

Actually, I was torn about get vs. set. For the design matrix, set seemed right, while for the others get seemed more appropriate even though they are setting into member objects rather than returning the "gotten" info.

As far as the sandbox tests, thanks for reminding me of those. I will copy them over to the new nimble branch when I start working on tests.

As far as blocks, I actually decided that was a rabbit hole/red herring we shouldn't go down now. We'll expect that a user would assign PG as a block to all parameters involved in a given set of response values.

paul-vdb commented 6 months ago

@paciorek I actually think 'set' is good because those functions cache and don't actually return anything to 'get'.

paciorek commented 6 months ago

Ok, I'm fine with set.

paul-vdb commented 6 months ago

@paciorek I've reinstalled the branch but the PG sampler won't compile. It runs fine in R. Was it compiling locally for you? I tried two different examples. I got this error message both times. Not really sure where it's falling apart. Any ideas?

paciorek commented 6 months ago

Sorry, I didn't fully check things. Can you send me your example and I will debug?

paul-vdb commented 6 months ago

Will push the changes I was just making now. On checking whether singleSize == TRUE. If stochastic size, then I think we don't need to check singleSize at all. We also only need to run it that one time.

set.seed(1)
lambda <- 6
beta0 <- 0.5
beta1 <- 0.15
J <- 20
K <- 10
type <- rep(0:1, each = J/2)
N <- rpois(n = K, lambda = lambda)
y <- matrix(NA, nrow = K, ncol = J)
for(i in 1:K){
  y[i,] <- rbinom(n = J, size = N[i], prob = expit(beta0 + beta1*type[i]))
}
code <- nimbleCode({
    beta0 ~ dnorm(0, sd = 10000)
    beta1 ~ dnorm(0, sd = 10)
    lambda ~ dgamma(1,1)
    for (k in 1:K) {
      p[k] <- expit(beta0 + beta1*type[k])
      N[k] ~ dpois(lambda)
      for(j in 1:J) y[k,j] ~ dbinom(size = N[k], prob = p[k])
    }
})

# Polya gamma sampler:
mod <- nimbleModel(code = code, constants = list(J=J, K=K, type=type), 
  data = list(y = y, N = rep(NA, K)), inits = list(beta0=0, beta1=0, lambda=5, N=apply(y,1,max)+5))
conf <- configureMCMC( mod, monitors = c('beta0', 'beta1', 'lambda') )
conf$removeSamplers( c("beta0", "beta1") )
conf$addSampler( type = "sampler_polyagamma", target = c("beta0", "beta1") )
Rmcmc <- buildMCMC(conf)
Cmod <- compileNimble(mod)
Cmcmc <- compileNimble(Rmcmc, project = mod)

paciorek commented 6 months ago

Ok, I've merged your changes with some of mine, including that we can't use setSize because that is a function name we use in the DSL and so it is causing a conflict.

As far as singleSize and stochSize, I agree it seems silly to check for single size if we have, say n[i]. But if we have a single size that is stochastic, such as dbin(n, p[i]), it might save some computation. However I don't know if that is a realistic use case so not inclined to do anything about that for now.

paul-vdb commented 6 months ago

@paciorek good point. I suppose you could have a single stochastic size. Is it slower to repeatedly loop to check size in that case, or to keep size as a vector? I think it's splitting some hairs there for sure.

My fault for breaking the code when renaming the methods. I'll have another look at it today. Thanks

paul-vdb commented 6 months ago

@paciorek I ran a couple of models, read through what you've changed and overall, I think it looks really good. I assume that we will find efficiencies in the future, but it seems really usable as it is. I didn't see any documentation. Examples I ran were N-Mixture (Stochastic Size), random effects logistic regression (stochastic design) with variations of the latter where I applied a random walk to the random effects and polyagamma to the fixed effects. Mixing the samplers didn't break anything and that was good to! It will be interesting to see this on bigger data problems where any inefficiencies would be more visible.

paciorek commented 6 months ago

@paul-vdb @perrydv @danielturek @kenkellner This adds an extensive set of tests as well as documentation for the Polya-gamma sampler.

A few comments:

I spent some time trying to carefully describe the sampler in roxygen in MCMC_samplers.R. More eyes would be great, particularly from Paul. I didn't add anything to the manual as we don't generally describe all our samplers, but open to other thoughts on that.
In setting up tests, I realized that based on the current checking of nodes involved in zero inflation, the user needs to say, via control$nonTargetNodes, if there are any stochastic nodes in the linear predictor that are not being sampled as part of the PG sampler. This would include stochastic indexes. With more work, we could probably deal with this in a more automated fashion, but I think it's ok for now.
Users can't provide zero inflation in the same declaration as the linear predictor, e.g., z[i]*expit(b0+b1*x[i]) is not allowed. This is because of how I do checking for the logit link. Again with more work, we could probably deal with this more elegantly but I think it is ok for now.
I was thinking of adding a nimble example discussing PG. Don't want to spend time on it now, but maybe a good little project while on a plane in coming weeks.

paul-vdb commented 6 months ago

@paciorek

Looks great overall. I couldn't see anything wrong with it and you've highlighted the sticky points above. The documentation looks clear (at least to me who is familiar) but comments from someone less familiar might be helpful.

Minor comment: In the future it might be good to put the get_param functions in another file for more general use. I'll want to use them with Laplace for prior Hessian computation in the future.

I think on the checks you've given the user enough options to choose not to check and just run it on faith. Such as when they've written it as z[i]*expit(b0+b1*x[i]) it seems like they can just check = FALSE and so we do actually allow it, just can't flag it as okay right now.

I read through the tests you provided, and they look well thought out and cover most of the key pieces.

kenkellner commented 6 months ago

@paciorek

The docs were very clear to me as someone less familiar with PG. In my experience, specifically from an occupancy perspective, I don't think I've seen anyone use a z[i]*expit(b0+b1*x[i]) style declaration so I doubt that will be a major issue.

paciorek commented 6 months ago

@paul-vdb I agree about moving the get_param functions. That said it shouldn't be in the Laplace files because those will be moved out into nimbleQuad eventually. I'll think about where they might better belong.

perrydv commented 6 months ago

@paciorek I've had a pretty good look at this.

I will provide suggestions on the roxygen entry via some other mode.
I noticed some minor redundancies in model queries in setup code. E.g. model$getDistribution(target) is done twice. Often those are slow steps in big use cases so we want to retain the results from one and only one call. In this specific example it probably doesn't matter. Might be worth scouring for other repeats.
I am confused about whether conjugacy checking part 2 uses only the first probNode. Is this reliable?
Should we make more aspects fully user-controllable: stochSize (example: maybe the sizes are stochastic in the model structure but the user knows they are not being sampled and so are not stochastic for the PG sampler's purposes), zeroInflated (are there any use cases where it would be simpler or more general to just let a user indicate this?)
A more radical idea would be to let a user say whether the final model$calculate is needed. As we've seen that is a meaningful computational cost and often will not be necessary.
At least some of the test code looked like it runs for only one iteration. Is that sufficient to exercise all code steps? E.g. in the zero-inflated case.

I realize some of these may be not worth time in the immediate crunch.

I also put in some timers for various steps, and here is what I came up with:

The case in test-polyagamma with n=1000 runs in about 8.5 seconds on my machine. Almost all of that is the PG sampler.
About 60% of that time comes from the actual polya-gamma draws, not all the matrix calculations.
I see some potential ways to shave time around the edges, and all together it might add up to a nice bit of savings, but nothing that would be an efficiency game changer. I will hold off for now.

paciorek commented 6 months ago

Regarding setup efficiency, I cleaned up the case you pointed out and another one I found. If you notice anything else specific, LMK.

Regarding reliabilility, the idea was to only check one because otherwise the checking could become quite burdensome for large models like our occupancy examples. I thought I had protected things in terms of it being ok to just check the first case by checking all the response nodes are from the same declaration, but I realized that really it's rather more complicated. I've added a bunch more checking that aims to ensure that one can just use the first case for the checking. It's all getting rather involved, which feels dangerous/finicky, but I don't see another way of avoiding making the setup quite time-consuming.

Regarding more user control, given our time crunch, I'm inclined to leave them aside. In particular allowing users to override calculation feels dangerous as it depends on knowing the implications of the exact sampler ordering. We could make the same argument about conjugate samplers too...

Single iterations are done because those tests just check initialization of designMatrix and related things so we just need to make sure the initialization code runs.

As far as timing, that is consistent with the timing I got for our occupancy examples. I'd be curious about your run-time efficiency ideas as I put some effort into improving efficiency already.

perrydv commented 6 months ago

Thanks. The reason I was thinking it might not be too dangerous to give an option to skip the calculation at the end is that in a typical case the "y[i]" will be data and there shouldn't be anything else that depends on any of the parameters (with the possible exception of posterior predictive nodes). But you make the final call for this release.

perrydv commented 6 months ago

Here is maybe a more noticeable thing: Is the restriction to >= 2 parameters due to the vector/scalar hassles? If so I could probably insert the "one_time_fixes" approach and allow it to work for 1 parameter as well. How important?

paciorek commented 6 months ago

Regarding skipping the calculate -- what happens when we then go to sample parameters in the prior of the PG target nodes. They won't access the correct logProb for their dependencies (the PG target nodes). I suppose you were thinking we could run calculate just on the target nodes at the end rather than on the full calcNodes? Or what about if this is being done on occupancy linear predictor parameters. Then when the zero inflation ("z") is sampled via our binary sampler, it won't have the correct logProb values. I might be missing something.

Regarding >= 2 parameters, yes that is vector/scalar hassles, perhaps most importantly in the matrix/dmnorm calcs involved in the conjugate update for the target.

paciorek commented 6 months ago

@perrydv I'm inclined to merge this and not deal with scalar target case as that's not just a one-time fixes thing, it is adding run code to do the math with dnorm.

paciorek commented 6 months ago

I added handling of scalar parameter. Good thing I did as I also realized we weren't handling scalar/vector issue related to a single (vector) target in terms of nodeLengths and normTypes potentially being scalars.

paciorek commented 6 months ago

Merging this as test failures appear to be unrelated.

nimble-dev / nimble

WIP: add Polya Gamma sampler #1439