Eigen assertion failure in `hcubature`

WardBrian commented 1 month ago

Description

Reported on the forums: https://discourse.mc-stan.org/t/using-the-wiener-lpdf-function-in-the-cmdstanr-package-to-implement-a-diffusion-model-the-construction-of-a-markov-chain-was-unsuccessful/35213/3

Example

Running the above model leads to an assert with the following backtrace

Details

``` level1: stan/lib/stan_math/lib/eigen_3.4.0/Eigen/src/Core/DenseCoeffsBase.h:410: Eigen::DenseCoeffsBase::Scalar& Eigen::DenseCoeffsBase::operator[](Eigen::Index) [with Derived = Eigen::Matrix; Eigen::DenseCoeffsBase::Scalar = double; Eigen::Index = long int]: Assertion `index >= 0 && index < size()' failed. Program received signal SIGABRT, Aborted. __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352369984) at ./nptl/pthread_kill.c:44 44 ./nptl/pthread_kill.c: No such file or directory. (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352369984) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=140737352369984) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=140737352369984, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007ffff7842476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007ffff78287f3 in __GI_abort () at ./stdlib/abort.c:79 ^[[B^[[A#5 0x00007ffff782871b in __assert_fail_base (fmt=0x7ffff79dd130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x555555713be4 "index >= 0 && index < size()", file=0x555555713110 "stan/lib/stan_math/lib/eigen_3.4.0/Eigen/src/Core/DenseCoeffsBase.h", line=410, function=) at ./assert/assert.c:92 #6 0x00007ffff7839e96 in __GI___assert_fail (assertion=0x555555713be4 "index >= 0 && index < size()", file=0x555555713110 "stan/lib/stan_math/lib/eigen_3.4.0/Eigen/src/Core/DenseCoeffsBase.h", line=410, function=0x555555714348 "Eigen::DenseCoeffsBase::Scalar& Eigen::DenseCoeffsBase::operator[](Eigen::Index) [with Derived = Eigen::Matrix; Eigen::DenseCoeffsBase::Scalar = doub"...) at ./assert/assert.c:101 #7 0x00005555555b6ee4 in Eigen::DenseCoeffsBase, 1>::operator[] (index=, this=0x55555584eed0) at stan/lib/stan_math/lib/eigen_3.4.0/Eigen/src/Core/DenseCoeffsBase.h:410 #8 _ZN4stan4math9hcubatureIZZNS0_8internal17wiener7_integrateILNS2_12GradientCalcE0ELS4_0EZNS0_11wiener_lpdfILb0EdddddNS0_9var_valueIdvEES7_S7_EEDaRKT0_RKT1_RKT2_RKT3_RKT4_RKT5_RKT6_RKT7_RKdEUlDpOT_E11_RdJRSt5tupleIJdddddddddEEiRN5Eigen6MatrixIdLin1ELi1ELi0ELin1ELi1EEES19_RKiSX_dEEEDaSD_OSE_DpOT3_ENKUlS10_E_clIJS15_RiS19_S19_S1B_SX_S12_EEEDaS10_EUlOT_OS8_OSB_S1C_OSH_OSK_OSN_OSQ_OST_OT8_E_ddS14_ddEEDaRKS1J_SG_iRKNS17_IS8_Lin1ELi1ELi0ELin1ELi1EEERKNS17_ISB_Lin1ELi1ELi0ELin1ELi1EEEiSH_SK_ (integrand=..., pars=std::tuple containing = {...}, dim=dim@entry=1, a=..., b=..., max_eval=max_eval@entry=6000, reqAbsError=reqAbsError@entry=0, reqRelError=reqRelError@entry=4.5000000000000003e-05) at stan/lib/stan_math/stan/math/prim/functor/hcubature.hpp:460 #9 0x00005555555ba715 in _ZZN4stan4math8internal17wiener7_integrateILNS1_12GradientCalcE0ELS3_0EZNS0_11wiener_lpdfILb0EdddddNS0_9var_valueIdvEES6_S6_EEDaRKT0_RKT1_RKT2_RKT3_RKT4_RKT5_RKT6_RKT7_RKdEUlDpOT_E11_RdJRSt5tupleIJdddddddddEEiRN5Eigen6MatrixIdLin1ELi1ELi0ELin1ELi1EEES18_RKiSW_dEEEDaSC_OSD_DpOT3_ENKUlSZ_E_clIJS14_RiS18_S18_S1A_SW_S11_EEEDaSZ_ (__closure=) at stan/lib/stan_math/stan/math/prim/prob/wiener_full_lpdf.hpp:141 #10 _ZN4stan4math8internal23estimate_with_err_checkILm0ELm8ELNS1_12GradientCalcE0ELb1ERKZNS1_17wiener7_integrateILS3_0ELS3_0EZNS0_11wiener_lpdfILb0EdddddNS0_9var_valueIdvEES7_S7_EEDaRKT0_RKT1_RKT2_RKT3_RKT4_RKT5_RKT6_RKT7_RKdEUlDpOT_E11_RdJRSt5tupleIJdddddddddEEiRN5Eigen6MatrixIdLin1ELi1ELi0ELin1ELi1EEES19_RKiSX_dEEEDaSD_OSE_DpOT3_EUlS10_E_S12_JS15_RiS19_S19_S1B_SX_S12_EEEDaOSH_OSK_DpOT5_ (err=@0x7fffffffb638: -15.431956022972571, functor=...) at stan/lib/stan_math/stan/math/prim/prob/wiener5_lpdf.hpp:633 #11 _ZN4stan4math8internal17wiener7_integrateILNS1_12GradientCalcE0ELS3_0EZNS0_11wiener_lpdfILb0EdddddNS0_9var_valueIdvEES6_S6_EEDaRKT0_RKT1_RKT2_RKT3_RKT4_RKT5_RKT6_RKT7_RKdEUlDpOT_E11_RdJRSt5tupleIJdddddddddEEiRN5Eigen6MatrixIdLin1ELi1ELi0ELin1ELi1EEES18_RKiSW_dEEEDaSC_OSD_DpOT3_ (hcubature_err=@0x7fffffffb638: -15.431956022972571, wiener7_functor=...) at stan/lib/stan_math/stan/math/prim/prob/wiener_full_lpdf.hpp:162 #12 stan::math::wiener_lpdf, stan::math::var_value, stan::math::var_value > (y=, a=@0x5555558281b0: 1, t0=@0x5555558281d8: 0.29999999999999999, w=@0x5555558281b8: 0.45000000000000001, v=@0x555555831550: 3.5, sv=..., sw=..., st0=..., precision_derivatives=@0x7fffffffba70: 0.0001) at stan/lib/stan_math/stan/math/prim/prob/wiener_full_lpdf.hpp:600 #13 0x00005555555cfe5b in level1_model_namespace::level1_model::log_prob_impl, -1, 1, 0, -1, 1>, Eigen::Matrix, (void*)0, (void*)0, (void*)0> (this=0x555555828120, params_r__=..., params_i__=..., pstream__=) at level1.hpp:413 #14 0x00005555555d06d3 in level1_model_namespace::level1_model::log_prob > (pstream=, params_r=..., this=) at level1.hpp:678 #15 stan::model::model_base_crtp::log_prob_propto_jacobian (this=, theta=..., msgs=) at stan/src/stan/model/model_base_crtp.hpp:132 #16 0x000055555566d792 in stan::model::model_base::log_prob > (msgs=, params_r=..., this=) at stan/src/stan/model/model_base.hpp:329 #17 stan::model::model_functional::operator() > (x=..., this=0x7fffffffbc50) at stan/src/stan/model/model_functional.hpp:21 #18 stan::math::gradient > (f=..., x=..., fx=@0x7fffffffcb40: -882.67991549259148, grad_fx=...) at stan/lib/stan_math/stan/math/rev/functor/gradient.hpp:51 #19 0x000055555566d99a in stan::model::gradient (model=..., x=..., f=@0x7fffffffcb40: -882.67991549259148, grad_f=..., logger=...) at stan/src/stan/model/gradient.hpp:27 #20 0x000055555566e012 in stan::mcmc::base_hamiltonian >::update_potential_gradient (this=0x7fffffffcb60, z=..., logger=...) at stan/src/stan/mcmc/hmc/hamiltonians/base_hamiltonian.hpp:63 #21 0x000055555566e381 in stan::mcmc::expl_leapfrog > >::update_q (this=, z=..., hamiltonian=warning: RTTI symbol not found for class 'stan::mcmc::diag_e_metric >' ..., epsilon=-0.018627259759257088, logger=...) at stan/src/stan/mcmc/hmc/integrators/expl_leapfrog.hpp:25 #22 0x00005555555e4c89 in stan::mcmc::base_leapfrog > >::evolve (this=0x7fffffffcb58, z=..., hamiltonian=warning: RTTI symbol not found for class 'stan::mcmc::diag_e_metric >' ..., epsilon=-0.018627259759257088, logger=...) at stan/src/stan/mcmc/hmc/integrators/base_leapfrog.hpp:20 #23 0x000055555570fc55 in stan::mcmc::base_nuts >::build_tree (this=this@entry=0x7fffffffcb00, depth=0, z_propose=..., p_sharp_beg=..., p_sharp_end=..., rho=..., p_beg=..., p_end=..., H0=H0@entry=-881.53588851820155, sign=sign@entry=-1, n_leapfrog=@0x7fffffffc1f4: 0, log_sum_weight=@0x7fffffffc208: -inf, sum_metro_prob=@0x7fffffffc200: 0, logger=...) at stan/src/stan/mcmc/hmc/nuts/base_nuts.hpp:254 #24 0x0000555555710bb4 in stan::mcmc::base_nuts >::transition (this=this@entry=0x7fffffffcb00, init_sample=..., logger=...) at stan/src/stan/mcmc/hmc/nuts/base_nuts.hpp:150 #25 0x0000555555711740 in stan::mcmc::adapt_diag_e_nuts >::transition (this=0x7fffffffcb00, init_sample=..., logger=...) at stan/src/stan/mcmc/hmc/nuts/adapt_diag_e_nuts.hpp:26 #26 0x00005555556605fb in stan::services::util::generate_transitions > (sampler=warning: RTTI symbol not found for class 'stan::mcmc::adapt_diag_e_nuts >' ..., num_iterations=num_iterations@entry=250, start=start@entry=0, finish=finish@entry=500, num_thin=num_thin@entry=1, refresh=refresh@entry=1, save=true, warmup=true, mcmc_writer=..., init_s=..., model=..., base_rng=..., callback=..., logger=..., chain_id=1, num_chains=1) at stan/src/stan/services/util/generate_transitions.hpp:70 #27 0x00005555556795ab in stan::services::util::run_adaptive_sampler >, stan::model::model_base, boost::random::mixmax_engine<17, 36u, 0l> > (sampler=warning: RTTI symbol not found for class 'stan::mcmc::adapt_diag_e_nuts >' ..., model=..., cont_vector=std::vector of length 3, capacity 3 = {...}, num_warmup=num_warmup@entry=250, num_samples=num_samples@entry=250, num_thin=num_thin@entry=1, refresh=1, save_warmup=true, rng=..., interrupt=..., logger=..., sample_writer=..., diagnostic_writer=..., metric_writer=..., chain_id=1, num_chains=1) at stan/src/stan/services/util/run_adaptive_sampler.hpp:78 #28 0x0000555555679c65 in stan::services::sample::hmc_nuts_diag_e_adapt (model=..., init=..., init_inv_metric=..., random_seed=random_seed@entry=2741745723, chain=chain@entry=1, init_radius=init_radius@entry=2, num_warmup=num_warmup@entry=250, num_samples=250, num_thin=1, save_warmup=true, refresh=1, stepsize=stepsize@entry=1, stepsize_jitter=stepsize_jitter@entry=0, max_depth=10, delta=delta@entry=0.80000000000000004, gamma=gamma@entry=0.050000000000000003, kappa=kappa@entry=0.75, t0=t0@entry=10, init_buffer=75, term_buffer=50, window=25, interrupt=..., logger=..., init_writer=..., sample_writer=..., diagnostic_writer=..., metric_writer=...) at stan/src/stan/services/sample/hmc_nuts_diag_e_adapt.hpp:103 #29 0x000055555567c1ed in stan::services::sample::hmc_nuts_diag_e_adapt, stan::callbacks::writer, stan::callbacks::unique_stream_writer >, std::default_delete > > >, stan::callbacks::unique_stream_writer >, std::default_delete > > >, stan::callbacks::json_writer >, std::default_delete > > > > (model=..., num_chains=num_chains@entry=1, init=std::vector of length 1, capacity 1 = {...}, random_seed=random_seed@entry=2741745723, init_chain_id=init_chain_id@entry=1, init_radius=init_radius@entry=2, num_warmup=num_warmup@entry=250, num_samples=250, num_thin=1, save_warmup=true, refresh=1, stepsize=stepsize@entry=1, stepsize_jitter=stepsize_jitter@entry=0, max_depth=10, delta=delta@entry=0.80000000000000004, gamma=gamma@entry=0.050000000000000003, kappa=kappa@entry=0.75, t0=t0@entry=10, init_buffer=75, term_buffer=50, window=25, interrupt=..., logger=..., init_writer=std::vector of length 1, capacity 1 = {...}, sample_writer=std::vector of length 1, capacity 1 = {...}, diagnostic_writer=std::vector of length 1, capacity 1 = {...}, metric_writer=std::vector of length 1, capacity 1 = {...}) at stan/src/stan/services/sample/hmc_nuts_diag_e_adapt.hpp:560 #30 0x00005555555e20ce in cmdstan::command (argc=, argv=) at src/cmdstan/command.hpp:684 #31 0x00005555555e4239 in main (argc=, argv=) at src/cmdstan/main.cpp:6 ```

It identifies the failure as coming from this line: https://github.com/stan-dev/math/blob/8e76f01fa078f27f2bff0e259fca3931da86c942/stan/math/prim/functor/hcubature.hpp#L460

Expected Output

Current Version:

v4.9.0

WardBrian commented 1 month ago

The matrices in box.b_ and box.a_ have data that appears to point to nullptr

WardBrian commented 1 month ago

Here is the reproducible example in a slightly easier to use format, independent of the R code: https://gist.github.com/WardBrian/c6d37114248be76948552fa2ed9c7770

This will fail shortly after iteration 5 of warmup.

Franzi2114 commented 1 month ago

So, what does it mean? Does the iterator kdiv_ run out of range? Or does it mean something else? This error did never appear in all the tests and with simulated data.

WardBrian commented 1 month ago

@Franzi2114 yes, it means that line is trying to index outside the matrix. In particular, it looked like the matrix that was failing had size 0 and no data

andrjohns commented 1 month ago

When doing some local testing, I can reliably reproduce when the loop:

for (i in 1:N) {

is only looping from 50 -> 52:

for (i in 50:52) {

Hopefully that helps simplify/speedup debugging somewhat

WardBrian commented 1 month ago

@andrjohns thanks, that does speed it up considerably.

If it matters, the specific line that is failing is the "left stimulus, right response" line target += wiener_lpdf(rt[i] | a, t0, 1 - w, -v[cnd[i]], sv, sw, st0);

andrjohns commented 1 month ago

Logged the values until it failed, here's a c++ mre:

#include <stan/math.hpp>
#include <iostream>

int main() {
  using stan::math::internal::GradientCalc;
  using stan::math::internal::wiener7_integrate;
  using stan::math::internal::wiener5_density;

  auto params_st = std::make_tuple(0.553273, 1, -3.5, 0.55, 0.553161,
                                    1.06423, 0.0941929, 0.0, -28.3242);
  Eigen::VectorXd xmin = Eigen::VectorXd::Zero(2);
  Eigen::VectorXd xmax = Eigen::VectorXd::Ones(2);

  auto f = wiener7_integrate<GradientCalc::OFF, GradientCalc::OFF>(
              [](auto&&... args) { return wiener5_density<GradientCalc::ON>(args...); },
              -18.3029, params_st, 1, xmin, xmax, 6000, 0, 4.5e-05);
}

WardBrian commented 1 month ago

I think it is because of the use of std::move on box.a_ and box.b_ on lines 482 and 483. My understanding is this leaves behind null data in the current box, which if it ever gets re-selected at the top of the loop leads to this error.

@Franzi2114 does that sound like the kind of thing that would happen? I'm not sure I understand the criteria by which the current box is selected enough to know if it can happen more than once

These moves were introduced in a68fafb8c24a961123f4d55163f733a619268568

WardBrian commented 1 month ago

Changing 482 and 483 to

    box_t box1(std::move(ma), box.b_, result_1, kdivide_1);
    box_t box2(box.a_, std::move(mb), result_2, kdivide_2);

seems to resolve the issue - at least for that MRE and the 50:52 simplification of the original model. Running the original model will be slow but probably worth doing at this point

Franzi2114 commented 1 month ago

Thanks! Yes, this sounds reasonable.

stan-dev / math