add option for ASIS sampling to RW sampler?

nimble-dev / nimble

The base NIMBLE package for R

http://R-nimble.org

BSD 3-Clause "New" or "Revised" License

156 stars 23 forks source link

add option for ASIS sampling to RW sampler? #1437

Closed paciorek closed 4 months ago

paciorek commented 5 months ago

This modifies the RW sampler to allow for joint sampling of the target along with random effects whose mean is the target or whose sd is the target, i.e., a joint sampler for either mu, b[1:p] or sigma, b[1:p] where b[i] ~ dnorm(mu, sd = sigma).

The sampler adds sampling in the noncentered parameterization, assuming that the model is set up for centered parameterization and samplers are already being applied in the centered parameterization. In particular the sampler proposes mu* (or sigma*) and then deterministically sets b[i]* to be b[i] - mu + mu* or mu + (sigma*/sigma) (b[i] - mu). The Jacobian (in the latter case) is omitted because the code omits calculation of the b[i] prior from the Metropolis ratio, which the Jacobian would cancel with.

This is a special case (but covers much of the cases envisioned in the paper) of the ASIS sampling strategy of Yu and Meng (2011; 10.1198/jcgs.2011.203main). Hence naming the option 'ASIS'. That said, it's not very informative, but other options would probably be wordy (e.g, "joint_with_random_effects")

The main question here from an API perspective is whether we want this as a control option to the RW sampler. It is only relevant for some models and might be confusing to users. However, most of the code is identical to the RW sampler, so having as a separate sampler would involve a bunch of code duplication. @perrydv @danielturek, I would like feedback from you on this question.

Additional notes: 1.) TBD how much checking we do to ensure the random effect distribution is valid vs. telling the user what it should be and expecting them to check. 2.) It would be straightforward to extend to the case where b[i] ~ dnorm(mu + eta[i] + gamma*z[i], sigma) and the target is any of mu, gamma, or sigma. 3.) Adding an ASIS option to other scalar samplers, in particular slice, is straightforward (and I have draft code for it). Not sure if we want to do that too.

We are currently using this sampler to explore sampling in MSOMs with species-level random effects.

I cancelled CI testing for this as there is no testing code and in principle this has no effect on existing use of RW sampler.

danielturek commented 5 months ago

@paciorek Thanks for tagging me on this.

To address your highest level design questions upfront:

I agree, I feel this should be added as part of the RW sampler, rather than being its own standalone sampler. That degree of code duplication would not be ok.
Thus, yes, this sampling strategy would be invoked using flags in the control list, as you've implemented it.
My initial feeling is using some other name, instead of ASIS. Perhaps includeRE? I think that would be better.

Another design option to consider: perhaps only having one control argument, called includeRE. If it has the default value of FALSE, operation is the usual RW sampler. If includeRE is provided as TRUE, then we automatically do some model processing and checking, and self-determine asisType and asisEffects. And, if includeRE = TRUE but the model structure is not amenable to this, we throw an error in the setup function. I feel I somewhat prefer this, to make it (much) more automated. Discussion is welcomed, I know it reduces fine-grain-user-control, but it feels reasonable to me in this case.

Prior to line 256 (subsetting ccList$calcNodesNoSelf) would it be wise to force asisEffects through expandNodeNames)? When asisEffects is user-supplied, I imagine it would usually be only a variable name re, or perhaps indexed as re[1:n].

Would likes 256 and 257 be more clear (and perhaps more efficient) if written using setdiff ?

Also, it seems line 257 might have a typo. As currently written, wouldn't this give a warning about non-multiple vector lengths? Was it intended to be:

ccList$calcNodesNoSelf <- ccList$calcNodesNoSelf[!ccList$calcNodesNoSelf %in% asisEffects]

I agree a lot of the checks should be added, per the comments in the current code. Perhaps also add:

## TODO: check asisEffects none are data (quite defensive)
## TODO: check asisEffects all come from the same declID (perhaps more important)

The second of those, above, seems important to cover some non-standard cases, e.g. some of the random effects come from a different declaration for example:

for(i in     1:r)   b[i] ~ dnorm(target,                  sd = whatever)
for(i in (r+1):n)   b[i] ~ dnorm(target + something_else, sd = whatever)

or one could imagine where the mean is target in both cases, but the sd arguments are different: whatever1 and whatever2.

Documentation at the end of MCMC_samplers.R should be updated, to include the Yu and Meng (2011) reference.

I would please request some formatting changes, to be consistent with other code/samplers, but that can come later (and I'd be quite happy to make those changes).

Could a test be added, to demonstrate the correct asymptotic posterior is achieved?

paciorek commented 5 months ago

@danielturek thanks for this very useful and detailed feedback.

On the question of automated checking, I agree this would be nice. However, I am wary of (a) being confident about catching all the special cases given our past experience in analogous situations and (b) the extra work of doing that given other priorities. My thought was that we describe to users when the sampler is appropriate and move some of the burden onto the user, albeit with some double-checking.

danielturek commented 5 months ago

@paciorek What you said makes sense; I certainly understand your point that "in the past, we're never able to anticipate and handle all use-cases; users always come up with something we didn't imagine".

My only lingering hesitation, is this approach makes the new option a big step more annoying to use. It would be so much nicer to write:

conf$addSampler("mu", "RW", includeRE = TRUE)

In the end, I'm fine either way.

paciorek commented 5 months ago

A middle ground might be to allow your example input but warn user in docs and, specifically, in a printed note what the sampler assumes, particularly for things we don't fully check. Let me think on that particular aspect more.

perrydv commented 5 months ago

Hi @paciorek @danielturek I am catching up here. A few thoughts:

We might want to clearly mark this as experimental or beta version or subject to change. That lightens the decision burden for initial release of the new feature.
I think both "asis" and "includeRE" seem cryptic. The way I think about it is that these are samplers in one dimension of transformed coordinate spaces. Maybe "RE_joint" or "RE_dnorm". Just starting to think about the name.
My initial though would be to make it a separate sampler. I can imagine growing complexity as we possibly allow log scale, changes that combine a shift and a scale, or the case of @paciorek's point 2 above. This could make the code for supporting it become non-trivial. We have the possibility that one sampler can contain an object of another sampler (or anything else), which can help reduce code reuse.
I think I see your point @paciorek about the cancellation of terms and hence the saved computing. But that does introduce the risk of getting wrong results if misused. If OTOH we do compute the priors of the "asisEffect" nodes (as currently named) and also the jacobian from the transformation (for scale) case, then we can make the acceptance ratio valid even if the sampler has been applied in the "wrong" context. I think we could go either way, since it will only be assigned manually (for now, anyway) and so can come with a warning to use it correctly.
Overall this has the feel of something where our thoughts will evolve a bit, so it seems possibly premature, but ok if clearly marked as beta.

paciorek commented 5 months ago

Here's a variation on some of the things under discussion.

We could have a distinct sampler, perhaps: sampler_joint_effects or sampler_noncentered_effects or sampler_joint_RE. In its setup we then have various options for checking and what not. In run it just calls out sampler_RW. (We could also allow a user to ask for various types of samplers on the top node, e.g. RW or slice, if we want a version of this that does slice sampling for top nodes.)

The actual sampling could done in our current sampler(s), in particular sampler_RW. Now that would mean that, as I have in the current PR, sampler_RW has some code blocks that are new and only for this use case. I don't see a way of avoiding that given the code blocks are embedded in the sampler. But if we have the user interface go through a separate sampler, then I think it's clearer to users what the sampler is, and we don't need to modify documentation for our core samplers. Of course we could also start out with the embedded sampler being a duplicate of sampler_RW but with the extra blocks and then combine into sampler_RW at a later date if avoiding code duplication seems like the right thing to do after further consideration. For the moment, actually, maybe that is the way to go.

To @perrydv 's point about correctness, yes, agreed and very good point. Where I am leaning at the moment is that we put in all the calculations to ensure correctness but then if we can in a simple fashion check and be confident that we are in a situation where we can leave calculations out, then we leave them out. I will also see, in the CDFW case, how much time is added to do the extra calculations - it may well be minimal.

paciorek commented 4 months ago

Based on this discussion and an offline discussion with Perry, here's what I've done:

create a sampler_noncentered that has an embedded RW or slice sampler, depending on what user specified in its control. The setup determines the dependent effects and passes them into setup for the embedded sampler.
modify both sampler_RW and sampler_slice to allow for joint sampling of target and set of dependent effects. This changes to code for these samplers and makes them less readable in terms of standard usage but avoids duplicated code. This choice can be revisited.
for robustness, as suggested by Perry, in all cases, we calculate the density of the transformed effects and the determinant of Jacobian of transformation, thereby avoiding making or checking assumptions about the distribution of the effects and whether that cancels with Jacobian of transformation
docs for RW and slice samplers will not discuss joint sampling as it is not a user-facing capability of these samplers. All documentation will be associated with sampler_noncentered.

Not yet done: writing documentation for sampler_noncentered and setting up tests.

paciorek commented 4 months ago

Ok, one update here. If we want to have a single sampler_RW that has regular sampler and allows the noncentered variation, we'd need a nimbleFunctionList to handle the fact that in run code we need to call getParam(node, 'mean') inside an if statement that is not active when using standard RW sampler.

At the moment this has tipped the balance into having our current sampler_RW and a new sampler_RW_noncentered and similarly for slice sampler. There will be a bunch of duplicated code. Could be revisited in the future.

paciorek commented 4 months ago

@perrydv @danielturek (@kenkellner in case you're interested given we've used this in the CDFW comparisons), I've cleaned up the noncentered sampler, adding roxygen and testing, and fixing somethings revealed in testing.

I think this is close to finished. The roxygen deserves some attention as I spent some time trying to describe the sampler carefully, but more eyes would help.

@danielturek please weigh in now on what formatting changes you'd like.

danielturek commented 4 months ago

@paciorek I looked over this carefully, and it looks great. This will be a nice addition.

perrydv commented 4 months ago

@paciorek I am taking a fresh look at this as we run up to a release.

FWIW, I could see this generalizing as a sampler that can shift and scale arbitrary other nodes, whether in the context of centered vs noncentered random effects or some other context. E.g., I once had success sampling regression coefficients in a rotated coordinate space where the rotation was known in advance, and that is a form of proposing one and shifting others accordingly. So the name "noncentered" feels like it might turn out to be too narrow, but I doubt we want to open that up right now. If we do something more general in the future, maybe we'll also settle on the issues of sharing implementation code while giving useful names.
I also realized a possible solution to the design issue that diverted you from having these steps live inside of an expanded sampler_RW. Since the outer object (sampler_noncentered) can use getParam to form the vector of relevant means if necessary, maybe it could just use a set_mean method in sampler_RW before running the sampler_RW, i.e. provide the information from above. But again I recognize we are not going to shift implementation for this release and sorry to keep opening up new ideas.
On a more practical level, the error-trap for specific known distributions that contain a mean parameter seems to preclude using this for a user-defined distribution. The same kind of issue came up recently in sampler_categorical, with the difference that the getParam calls were needed only in setup code. I'm thinking we could either remove this error-trap, or provide a control element to disable it, or try calling getParam in the setup code to see if there is a mean parameter. I feel moderately strongly that a user should be able to write a distribution with a mean and be able to use this sampler.
About the issue of whether to do the calculations that are unnecessary if the sampler is used in a correct context: we could provide a control element to allow the faster execution if one is certain the context allows it. We might find this useful and so might some users.
I will provide roxygen edits in some other mode.

perrydv commented 4 months ago

Also, how about shortening control list arguments:

samplerParam to param?
samplerType to either sampler or type?

paciorek commented 4 months ago

I've added a dynamic check for presence of a mean parameter.

As far as simpler names, we can't use type as that would conflict with the type='noncentered' if the user passes control args via .... I'll change to sampler and param.