unconstrain_pars() sometimes returns infinite values

crsh commented 3 years ago

Summary:

unconstrain_pars() may return infinities for constrained samples that (possibly due to numerical problems) are on the bound of the parameter space. This can cause downstream problems, e.g., when attempting to use the unconstrained prameters for bridge sampling.

Description:

I fit a model using brms and tried to estimate the model's marginal likelihood using bridge_sampling(), but got the following error:

Error in out[!from@positive] <- -out[!from@positive] : 
  NAs are not allowed in subscripted assignments
In addition: Warning messages:
1: effective sample size cannot be calculated, has been replaced by number of samples. 
2: 1 of the 60000 log_prob() evaluations on the posterior draws produced -Inf/Inf. 
3: 100 of the 60000 log_prob() evaluations on the proposal draws produced -Inf/Inf. 
Error: Bridgesampling failed. Perhaps you did not set 'save_pars = save_pars(all = TRUE)' when fitting your model?

After some digging, I realized that the reason was an infinite value returned by unconstrain_pars(). Specifically, in one MCMC draw contains a correlation coefficient estimate of exactly 1 was backtransformed via atanh() to Inf.

Reproducible Steps:

The model was fit using cmdstanr, so this requires a recent version of brms. I use:

 package        * version    date       lib source        
 brms           * 2.16.1     2021-08-23 [1] CRAN (R 4.1.1)

Download the stanfit-object here.

The following code reproduces the error in the bridge sample:

library("brms")
brms_samples <- readRDS("brms_samples.rds")
bridge_sampler(brms_samples)

The following shows where the infinite unconstrained value arises:

library("rstan")

# Internal brms code
samples <- brms:::restructure(brms_samples)
samples <- brms:::update_misc_env(samples)
samples <- samples$fit
stanfit_model <- samples

# Internal bridgesampling code
ex <- extract(samples, permuted = FALSE)
skeleton <- rstan:::.create_skeleton(samples@sim$pars_oi,
                                              samples@par_dims[samples@sim$pars_oi])
upars <- apply(ex, 1:2, FUN = function(theta) {
  unconstrain_pars(stanfit_model, rstan:::.rstan_relist(theta, skeleton))
})

Current Output:

Because the object returned by unconstrain_pars() does not give parameter names, I had to look up the parameter name in the constrained samples:

> which(is.infinite(upars), arr.ind = T)
         iterations chains
[1,] 131      14398      3

# Unconstrained samples that include parameter names
ex2 <- extract(samples, c("b", "Intercept", "sd_1", "z_1", "L_1"), permuted = FALSE)

# The index is shifted by 1 becasue of the additional `cholesky_factor_corr`
# components in the constrained smaples
> ex2[14398, 3, 132, drop = FALSE]
, , parameters = L_1[2,1]

          chains
iterations chain:3
      [1,]       1

> atanh(ex2[14398, 3, 132])
[1] Inf

Expected Output:

An appropriate finite value.

RStan Version:

2.21.2

R Version:

4.1.1 (2021-08-10)

Operating System:

Ubuntu 18.04

bob-carpenter commented 3 years ago

Thanks for posting the issue, especially for digging into the root cause.

There's no way, in general, to avoid underflow and overflow in floating point. It seems like this problem may be fixable because it looks like the values you are trying to unconstrain originally came from samples. Those should have finite inverses if we get the numerics right. For arbitrary new inputs, there's no way to guarantee finiteness without harming other numerical properties (or at least we've never figured out how to do it---just putting finite upper or lower bounds leads to all kinds of problems with monotonicity and derivatives).

For the example code at hand, would it be possible to just use the original unconstrained variables rather than round tripping?

I'm pinging @paul-buerkner so he doesn't miss a brms issue.

paul-buerkner commented 3 years ago

Thank you for your input, Bob. I don't think this can be fixed from the brms side either as the relevant rstan functions are called inside bridgesampling and brms merely passes stuff to the corresponding stanfit method.

bob-carpenter commented 3 years ago

It seems like the fundamental problem is trying to go from unconstrained to constrained and then back again. That round trip should work, but apparently it doesn't here. But it doesn't look like there's an easy way to grab the unconstrained params. Maybe @bgoodri will have some idea.

@crsh: Do you have any idea which transforms are failing. If we can isolate the problem, we might be able to fix on the Stan side in our constrain/unconstrain code.

crsh commented 3 years ago

It seems like the fundamental problem is trying to go from unconstrained to constrained and then back again.

That was my impression as well.

@crsh: Do you have any idea which transforms are failing. If we can isolate the problem, we might be able to fix on the Stan side in our constrain/unconstrain code.

Well, in the case above I was under the impress that it was the atanh-transform, but I assume this could come up in other transformations as well. I hit this problem during a simulation study and I've had a few more runs that errored (assumedly for the same reason), but the fit above was the only one I have been able to dig into thus far. I can try to inspect other fits, but I probably won't get around to it before the end of next week. I expect that if there is another parameter in my model that could cause issues, it would be the positively constrained standard deviations (e.g., when 0 is unconstrained via log(0)).

bob-carpenter commented 3 years ago

We don't use atanh. The full set of transforms for constraints is documented in the Reference Manual chapter on Constraint Transforms. The inverse logit for upper and lower-bounded is documented in this section.

We use exp() for upper- or lower-bounded.

Where do the values being mapped come from? If they are constrained values for draws from the posterior, they should be mathematically invertible. If they're new values, then there may be no way to deal with the overflow.

One thing to do is regularize with priors to keep estimates from degenerating to boundaries.

crsh commented 3 years ago

We don't use atanh. The full set of transforms for constraints is documented in the Reference Manual chapter on Constraint Transforms.

Sorry, I'm a little confused. From what I understand, the section you link to lists the inverse hyperbolic tangent transform for cholesky-factorized correlation matrices. The variable L_1[2,1] that is transformed to infinity is the off-diagnoal of a 2x2 correlation matrix. I also verfied that applying atanh to other draws yields the same values as those returned by unconstrain_pars(). What am I missing?

One thing to do is regularize with priors to keep estimates from degenerating to boundaries.

Good point. In my case, there's definitley room to make the LKJ-prior more informative.

bob-carpenter commented 3 years ago

lists the inverse hyperbolic tangent transform for cholesky-factorized correlation matrices.

I was the one who was confused. I thought we were talking about an upper-and-lower bounded scalar, for which we use inverse logit.

For a 2 x 2 correlation matrix, that off-diagonal in the Cholesky is the only free parameter, because the rows have to be unit length with positive diagonals. What must be happening is that it's running off to +/- 1 correlation, which is toward +/- inf on the unconstrained scale. So is the value being unconstrained equal to -1 or 1? If so, there's no finite transform. Regularization could help here.

The other way to parameterize a 2 x 2 correlation matrix is with a correlation constrained to be between [-1, 1],

parameters {
  real<lower=-1, upper=1> rho;
}
transformed parameters {
  corr_matrix[2] Sigma = [[1, rho], [rho, 1]];
}
model {
  (rho + 1) / 2 ~ beta(a, b);  // affine rescaling doesn't need Jacobian

The cost to factor a 2 x 2 covariance matrix is pretty trivial.

crsh commented 3 years ago

So is the value being unconstrained equal to -1 or 1?

Yes, it's exactly 1 and transforms to inf.

Thanks for the suggested parameterization. Given that samples this extreme occur only very rarely in my case, I think I will simply regularize by using a more informative prior so I can stick with the brms interface for my simulations.

If there is no other way to address this, I wonder how bridgesampling could address this issue on their end. Could the bridgesampler ignore draws that contain infinities? This will probably bias the estimates, but may not be a big problem as long as the number of discarded samples is low (in this case it's 1/120.000)?

bob-carpenter commented 3 years ago

I'm afraid I don't know. I think this'll have to be an issue for whichever package is using bridgesampling. It may be that they can bypass the round trip to constrained parameters. I'm pinging @paul-buerkner and @bgoodri and @jgabry who should have a better sense than me if and where this can be addressed.

stan-dev / rstan