stan-dev / math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
752 stars 188 forks source link

"numeric overflow" error for models with von Mises distributions and large kappa values #1076

Open syclik opened 5 years ago

syclik commented 5 years ago

From @a-hurst on December 4, 2018 22:59

Summary:

Attempting to run a model that uses the von_mises_lpdf() function will cause the sampler to error out immediately if kappa values are too large (~10 or larger).

Description:

When trying to run a model that uses a von Mises distribution via the von_mises_lpdf() function, the sampler crashes immediately with an error about a boost function (cyl_bessel_i) that's raised an exception due to a 'numeric overflow' when kappa values are too high. This appears to be due to an issue either with the boost function implementation in Stan, or a bug in the cyl_bessel_i function itself.

This is an issue that has come up multiple times over the past few years, from several threads in the old Stan Google Groups page (1, 2, 3) and on the new Stan forum (4). To my knowledge, a formal bug report has yet to be filed for it. I'm new to Stan (just trying to make sure someone else's analysis is reproducible before I upload everything to OSF), so the linked threads above are probably a better source for info on the issue than what I've provided here.

Thanks in advance for the help!

Reproducible Steps:

Here is the model in full:

data{
  int<lower=0> N ;
  int<lower=0> L ;
  real lrt[L] ;
  real angle[L] ;
  int<lower=1> id[L] ;
  int<lower=1,upper=3> cue[L] ;  // invalid/valid/neutral
}

transformed data{
  vector[10] zeros ;
  real neglog2pi ;
  neglog2pi = -log(2.0 * pi()) ; // log-probability of uniform component (it's data invariant)
  zeros = rep_vector(0,10) ;
}

parameters{
  vector[10] within_means ;
  vector<lower=0>[10] within_sds ;
  // correlation
  corr_matrix[10] cor ;
  //dummy variable for matt trick
  matrix[N,10] beta;
}

transformed parameters{
vector[L] p ; // for storing log-probabilities associated with model
{
  // id-level parameter values
  vector[N] id_logit_rho_intercept ;
  vector[N] id_logit_rho_cuing1 ;
  vector[N] id_logit_rho_cuing2 ;
  vector[N] id_log_kappa_intercept ;
  vector[N] id_log_kappa_cuing1 ;
  vector[N] id_log_kappa_cuing2 ;
  vector[N] id_lrt_mean_intercept ;
  vector[N] id_lrt_mean_cuing1 ;
  vector[N] id_lrt_mean_cuing2 ;
  vector[N] id_lrt_sd ;
  // id-level cell values as matrix
  vector[N] id_logit_rho[3] ;
  vector[N] id_log_kappa[3] ;
  vector[N] id_lrt_mean[3] ;
  //useful transformations
  vector[N] id_kappa[3] ;
  vector[N] id_rho[3] ;

  //convert from beta scale to observed scale & add group effects
  id_logit_rho_intercept = beta[,1] * within_sds[1]
  + within_means[1]
  ;
  id_logit_rho_cuing1 = beta[,2] * within_sds[2]
  + within_means[2]
  ;
  id_logit_rho_cuing2 = beta[,3] * within_sds[3]
  + within_means[3]
  ;
  id_log_kappa_intercept = beta[,4] * within_sds[4]
  + within_means[4]
  ;
  id_log_kappa_cuing1 = beta[,5] * within_sds[5]
  + within_means[5]
  ;
  id_log_kappa_cuing2 = beta[,6] * within_sds[6]
  + within_means[6]
  ;
  id_lrt_mean_intercept = beta[,7] * within_sds[7]
  + within_means[7]
  ;
  id_lrt_mean_cuing1 = beta[,8] * within_sds[8]
  + within_means[8]
  ;
  id_lrt_mean_cuing2 = beta[,9] * within_sds[9]
  + within_means[9]
  ;
  id_lrt_sd = exp(
  beta[,10] * within_sds[10]
  + within_means[10]
  )
  ;

  //compute values for each cell
  id_lrt_mean[1] = id_lrt_mean_intercept + id_lrt_mean_cuing2;
  id_lrt_mean[2] = id_lrt_mean_intercept - id_lrt_mean_cuing1;
  id_lrt_mean[3] = id_lrt_mean_intercept;

  id_logit_rho[1] = id_logit_rho_intercept + id_lrt_mean_cuing2;
  id_logit_rho[2] = id_logit_rho_intercept - id_lrt_mean_cuing1;
  id_logit_rho[3] = id_logit_rho_intercept;

  id_log_kappa[1] = id_log_kappa_intercept + id_lrt_mean_cuing2;
  id_log_kappa[2] = id_log_kappa_intercept + id_lrt_mean_cuing1;
  id_log_kappa[3] = id_log_kappa_intercept;

  //compute the transforms
  for(i in 1:3){
    id_kappa[i] = exp(id_log_kappa[i]) ;
    for(n in 1:N){
      id_rho[i][n] = inv_logit(id_logit_rho[i][n]) ;
    }
  }

  //iterate over trials (this version doesn't have pre-computed id cell values)
  for(l in 1:L){
    // if (id_kappa[cue[l]][id[l]] > 10) {
    //   p[l] = normal_lpdf(lrt[l] | id_lrt_mean[cue[l]][id[l]], id_lrt_sd[id[l]])
    //   + log_mix(
    //     id_rho[cue[l]][id[l]]
    //     , normal_lpdf(angle[l] | pi(), sqrt(1/id_kappa[cue[l]][id[l]]))
    //     , neglog2pi
    //   );
    // } else {
      p[l] = normal_lpdf(lrt[l]|id_lrt_mean[cue[l]][id[l]], id_lrt_sd[id[l]])
      + log_mix(
        id_rho[cue[l]][id[l]]
        , von_mises_lpdf(angle[l]|pi(), id_kappa[cue[l]][id[l]])
        , neglog2pi
      );
    // }
  }
}
}

model{
  //priors
  within_means ~ student_t(4,0,1) ;
  within_sds ~ student_t(4,0,1) ;
  within_means[1] ~ student_t(4,3,3) ; //logit-rho intercept
  within_means[4] ~ student_t(4,3,3) ; //log-kappa intercept
  cor ~ lkj_corr(4) ;
  //assert sampling of each id's betas from multivariate student-t with df=4
  for(this_id in 1:N){
    beta[this_id,] ~ multi_student_t(4,zeros,cor) ;
  }
  //update the log-probability from p (defined in transformed parameters)
  target += p ;//used to be: increment_log_prob(p) ;
}

Here's the R code being used to run the model:

# Upload stan_data ----
load("endo_data_for_stan.rdata")

# Load Packages ----
library(tidyverse)
library(rstan)

# Run Stan Model 1 ----
mod = rstan::stan_model("_endo_model.stan")

post = sampling(
  mod
  , data_for_stan
  , pars = c("p")
  , include = FALSE
  , chains = 10
  , cores = 10
  , iter = 2000
  , refresh = 1
  , verbose = T
  , control = list(
    adapt_delta=0.99
    # , max_treedepth = 15
    )
)

I have an Rdata with all the data for the model as well, but I'm not sure how best to share that.

Current Output:

Here's what the output looks like (number of chains dropped to 1 for readability):


CHECKING DATA AND PREPROCESSING FOR MODEL '_endo_model' NOW.

COMPILING MODEL '_endo_model' NOW.

STARTING SAMPLER FOR MODEL '_endo_model' NOW.

SAMPLING FOR MODEL '_endo_model' NOW (CHAIN 1).
Chain 1: Unrecoverable error evaluating the log probability at the initial value.
Chain 1: Exception: Error in function boost::math::cyl_bessel_i<double>(double,double): numeric overflow  (in 'model1b1d79c30c67__endo_model' at line 120)

[1] "Error in sampler$call_sampler(args_list[[i]]) : "
[2] "  Exception: Error in function boost::math::cyl_bessel_i<double>(double,double): numeric overflow  (in 'model1b1d79c30c67__endo_model' at line 120)"
error occurred during calling the sampler; sampling not done

RStan Version:

2.18.2

R Version:

R version 3.5.1 (2018-07-02)

Operating System:

macOS 10.14.1

Copied from original issue: stan-dev/rstan#593

syclik commented 5 years ago

@a-hurst, thank you very much for reporting. I'm going to move it to the Math library because that's where it'll need to be fixed.

There's going to be a little noise on this issue (the issue mover will close this automatically and I want to reopen it). I'd like the issue to remain open here until we can verify that it's been fixed properly.

a-hurst commented 5 years ago

So I did a bit of digging into this today to see if I could find anything helpful. Here's what I learned:

https://github.com/stan-dev/math/blob/28a607896611079ec75a25032b07fad496a8e897/stan/math/prim/scal/prob/von_mises_lpdf.hpp#L82-L87

the return values from the modified_bessel_first_kind() calls (which is just a wrapper for cyl_bessel_i()) only appear to be used in the case where kappa_const is false:

https://github.com/stan-dev/math/blob/28a607896611079ec75a25032b07fad496a8e897/stan/math/prim/scal/prob/von_mises_lpdf.hpp#L102-L104

Would the improved bessel function provided by @bgoodri be able to be adapted to replace the invocations of modified_bessel_first_kind() here? If not, in what use cases would those conditionals that cause modified_bessel_first_kind() to run be false when using von_mises_lpdf(), so that the bug might be avoided that way while waiting for a more permanent fix?

Hope any of this is helpful in finding a fix!

bob-carpenter commented 5 years ago

Thanks a lot for sharing what you found. I'm sure it'll help in debugging this.

In some cases, we just can't get the stability we need and have reverted to things like normal approximations. So we can branch on segments of the input domain if necessary.

a-hurst commented 5 years ago

@bob-carpenter @syclik Is there anything else I can do to help with this, whilst not being confident enough in my C or advanced math to try my hand at a fix myself?

I'm in a bit of a strange dilemma: The Stan code I posted above apparently ran without issue ~2 years ago with rStan 2.15 on a rented Amazon Linux VM (a colleague did the analysis for me), and I wrote up a whole paper based on the findings from those posterior samples. Now there's an upcoming special issue that the paper would be great for, with a submission deadline at the end of this month, but now I can't seem to get the analysis code to run due to this bug and thus can't put it on OSF (my colleague says he remembers encountering the same bug on his MacBook, and doesn't remember how he got it to work on the Amazon VM). Since I'd ideally like my science to be reproducible, I'm willing to to any background work that would make a fix easier.

If it would help, I can post the .Rdata file with the successful posterior distribution here so you can look at the compiled model code from it (I've already confirmed that the Stan code in the object is identical to that posted above) and see if that sheds light on anything. I have yet to try running the old compiled model with its matching old version of rStan in a VM, but that's on my list of things to try next.

Thanks again!

bbbales2 commented 5 years ago

In other words: if it doesn't crash the chain at the very start, it doesn't crash the chain at all.

It sounds like the initial conditions are making it difficult to compute the log densities. It's totally possible that these numeric difficulties don't really exist where your posterior mass is and it's just an artifact of the random initialization.

You might be able to get away from these crazy initial values by just making the random initializations smaller. Check the "init_r" argument to stan in rstan. Maybe make it 1 instead of 2 (the default) and see if that gets rid of the problem. You can use custom initial values to try to track down more precisely what is blowing up here. You can also add prints to your model since you'll know pretty quickly if the model is or is not working (and check which parameters seem to cause the blow up).

bob-carpenter commented 5 years ago

Sorry to hear about the reproducibility problems. Nothing major should've changed since RStan 2.15.

When you say it's crashing, you mean it's segfaulting or something and taking down the R environment? Or is it just failing to initialize? If it's the latter, you can probably get away with providing some inits for some values or just reducing the initialization range. Reparameterization can also help so that Stan's initial values are reasonable. For example, Stan iniitalizes in (-2, 2) on the unconstrained scale, which isn't always reasonable.

If there's lingering instability, you can try something like normal approximations. Often these stability issues are for the equivalent of large values where the normal approximation is very close.

On Dec 10, 2018, at 10:39 PM, a-hurst notifications@github.com wrote:

So I did a bit of digging into this today to see if I could find anything helpful. Here's what I learned:

• The bug isn't macOS specific: it happens with RStan on Ubuntu 16.04 LTS as well. • The bug only happens sometimes: if I try and run the sampling function a couple times, sometimes it'll work and the chain will start and proceed just fine. I tried running the model with 10 chains, and only 8 of them crashed immediately with the 'numeric overflow' exception, the 2 that didn't proceeded normally until I left for the day. In other words: if it doesn't crash the chain at the very start, it doesn't crash the chain at all. • It looks like there was a merged pull request from a little over a year ago (#640) that was meant to deal with "numeric instability" in the von_mises_lpdf() function by writing a new and more robust log modified bessel function to replace the problematic one from boost. However, boost's cyl_bessel_i() function is still indirectly used twice in the von_mises_lpdf() function in certain cases (which I don't fully understand based on my reading of the code, but seem to have something to do with whether kappa is a constant or not?): https://github.com/stan-dev/math/blob/28a607896611079ec75a25032b07fad496a8e897/stan/math/prim/scal/prob/von_mises_lpdf.hpp#L82-L87

the return values from the modified_bessel_first_kind() calls (which is just a wrapper for cyl_bessel_i()) only appear to be used in the case where kappa_const is false:

https://github.com/stan-dev/math/blob/28a607896611079ec75a25032b07fad496a8e897/stan/math/prim/scal/prob/von_mises_lpdf.hpp#L102-L104

Would the improved bessel function provided by @bgoodri be able to be adapted to replace the invocations of modified_bessel_first_kind() here? If not, in what use cases would those conditionals that cause modified_bessel_first_kind() to run be false when using von_mises_lpdf(), so that the bug might be avoided that way while waiting for a more permanent fix?

Hope any of this is helpful in finding a fix!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

a-hurst commented 5 years ago

@bbbales2 @bob-carpenter thank you both for your helpful replies, it sounds extremely likely that this is a problem with the initial parameter values sometimes starting too high or low. Looking at the posterior object it looks like the default init radius of 2 was used for the successful run, but I'm going to try extracting the random seed and init values from it and seeing if the model runs now when I use them. Thanks again!

syclik commented 5 years ago

I think I've seen something similar and it's been fixed post v2.18, but we should verify.

I think what's going on is that certain exceptions aren't caught and it isn't caught, which ends up completely stopping the sampler from running. Instead, it should be caught and treated as a rejection.

(With this particular issue, this might be have not been a problem with the version of Boost we used in 2.15.)

a-hurst commented 5 years ago

If this is the case, should I try installing the latest Stan/RStan from the develop branch on GitHub and see if the bug still occurs?

EDIT: Just tried again with RStan installed from source using

remotes::install_github("stan-dev/rstan", ref = "develop", 
                        subdir = "rstan/rstan", build_opts = "")

and I get the same error:

Chain 3: Unrecoverable error evaluating the log probability at the initial value.
Chain 3: Exception: Error in function boost::math::cyl_bessel_i<double>(double,double): numeric overflow  (in 'model4b485d3c172d__endo_model' at line 120)

[1] "Error in sampler$call_sampler(args_list[[i]]) : "
[2] "  Exception: Error in function boost::math::cyl_bessel_i<double>(double,double): numeric overflow  (in 'model4b485d3c172d__endo_model' at line 120)"
Chain 6: Unrecoverable error evaluating the log probability at the initial value.
Chain 6: Exception: Error in function boost::math::cyl_bessel_i<double>(double,double): numeric overflow  (in 'model4b485d3c172d__endo_model' at line 120)

error occurred during calling the sampler; sampling not done
[1] "Error in sampler$call_sampler(args_list[[i]]) : "
[2] "  Exception: Error in function boost::math::cyl_bessel_i<double>(double,double): numeric overflow  (in 'model4b485d3c172d__endo_model' at line 120)"
error occurred during calling the sampler; sampling not done

Unless I didn't install the latest Stan/RStan from source properly, it looks like the bug is still present post 2.18 as well.

a-hurst commented 5 years ago

Okay, after some reading up, I understand now that the 'develop' branch of RStan != the' develop' branch of Stan, and that I would need to use CmdStan somehow if I wanted to try out the bleeding edge version. Given that I don't really understand how CmdStan works well enough to test out the model this way, I'm going to leave it for now.

Also, I just tried the suggestion of setting init_r to 1 and all 10 chains started properly, so that seems to confirm it's a problem with starting values being too extreme. Setting it to 1.5 made 2 of the 10 chains fail (as opposed to the usual ~8/10). Does changing this constraint have any notable effects on the output model that I should be worried about?

Oddly enough, I tried reusing the seed and inits from the old 2017 stanfit as arguments for the new one and I still got the exception on most of the chains, so the fact that it worked a couple versions ago doesn't seem to be due to some particularly lucky random values that avoided an overflow. This lends some support to @syclik's Boost theory, though the fact that this bug has been reported as early as 2016 in the old Stan mailing list would make that very odd, since it would have been broken, fixed, and then broken again by a later Boost release. Maybe the Amazon VM was using a custom version of Boost?

bbbales2 commented 5 years ago

Does changing this constraint have any notable effects on the output model that I should be worried about?

Shouldn't matter, but this is a function of your problem. Did changing this affect your conclusions at all? If it didn't, rock n' roll. bessel functions n' such can be very finicky, so it's not surprising one might blow up if it gets a weird input. Since you're not seeing this problem after warmup, no need to worry. All sorts of weird things are going on in warmup which is why we don't use those samples in our posterior estimates.

As far as I know 2 is just an arbitrary number.

bob-carpenter commented 5 years ago

On Jan 10, 2019, at 11:46 AM, Ben Bales notifications@github.com wrote:

As far as I know 2 [the init range] is just an arbitrary number.

Arbitrary, but motivated by wanting to have roughly unit scale variation in the parameter inits assuming we'd have parameters on the unit scale and want diffuse inits to test convergence.

tbrown122387 commented 4 years ago

FWIW, I was brought here by the same error message, and for me, it wasn't really a bug. Personally, the reason I was getting it was because I was simulating from a nonsensical noninformative prior. I had one crazy variance parameter out of many, and that in turn generated some even crazier latent data.

bob-carpenter commented 4 years ago

Thanks for reporting. We're worrying more about prior predictive checks these days for just this reason.

On Jan 31, 2020, at 5:05 PM, Taylor R. Brown notifications@github.com wrote:

FWIW, I was brought here by the same error message, and for me, it wasn't really a bug. Personally, the reason I was getting it was because I was simulating from a nonsensical noninformative prior. I had one crazy variance parameter out of many, and that in turn generated some even crazier latent data.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

venpopov commented 7 months ago

Fixed by #3036