Closed drezap closed 6 years ago
Data size can cause out of memory errors, but it shouldn't lead to a segfault.
Could you create a complete end-to-end example that fails in the way you're not expecting? It's not possible to call that _rng
funtion in the transformed parameters block, so there must be a different set up when there aren't seg faults in transformed parameters.
Is there a problem if you assign to two matrices then add?
matrix[N, N_pred] a = gp_dot_prod_cov(x, x_pred, sig) +
matrix[N, N_pred] b = gp_exp_quad_cov(x, x_pred, magnitude, length_scale);
matrix[N, N_pred] ab = a + b;
P.S. When opening an issue please label it with deliverable and bug status and ideally with an assignee or a tag that it's a good first issue.
Thanks for the detailed response, I'll be sure to include all this when I open an issue next time.
A few things to note: This is not big data, N=303. Any time I sum kernels in the functions block, even with smaller data, I get this segfault.
1) I'm attaching 1 end-to-end example that will give the seg-fault. This can be run in command-stan with ./logistic_gp_segfault sample num_samples=200 num_warmup=200 data file=heart_disease_classification.data.R
:
logistic_gp_segfault.txt
heart_disease_classification.data.R.txt
and I've changed extensions to .txt
so I could upload directly. The first one is stan
code, so one will need to change the extension of the first file from .txt
to .stan
.
Is there a problem if you assign to two matrices then add?
Yep, here's another stan code attached where I get a segfault w/ this strategy. We can use the same data and a similar command as above. logistic_gp_segfault_bob.txt
P.S. When opening an issue please label it with deliverable and bug status and ideally with an assignee or a tag that it's a good first issue.
Will do! Thanks for the workflow feedback.
Thanks. Stan shouldn't be segfaulting no matter what happens. So we need to get to the bottom of this.
I updated everything to `develop, but I can't compile that file:
No matches for:
gp_dot_prod_cov(vector[], real)
Function gp_dot_prod_cov not found.
error in '/Users/carp/temp2/drezap/gp-segfault-2.stan' at line 10, column 48
-------------------------------------------------
8: vector[N_pred] f2;
9: {
10: matrix[N, N] k1 = gp_dot_prod_cov(x, sig);
^
11: matrix[N, N] k2 = gp_exp_quad_cov(x, magnitude, length_scale);
-------------------------------------------------
Are you working on a branch or something?
I'd suggest trying to debug what's going into those matrices k1
and k2
--- when you're done, are they actually N x N
? You should be able to print them out or print out elements that aren't finite (using is_inf()
and is_nan()
). Multiplication itself shouldn't be causing a problem.
The other thing to do is go in and instrument the generated .hpp with std::cout
directed print statements with std::endl
to flush---then you can diagnose where the segfault arises.
You might also be able to detect some kind of problem with the types looking at the generated code. What's the return type for your functions?
I'm working a hacked local cmdstan-dev
.... to get this to work you need to throw these lines in function signatures:
add("gp_dot_prod_cov", expr_type(matrix_type()), expr_type(double_type(), 1U), expr_type(double_type()));
add("gp_dot_prod_cov", expr_type(matrix_type()), expr_type(vector_type(), 1U), expr_type(double_type()));
add("gp_dot_prod_cov", expr_type(matrix_type()), expr_type(row_vector_type(), 1U), expr_type(double_type()));
// x1, x2
add("gp_dot_prod_cov", expr_type(matrix_type()), expr_type(double_type(), 1U), expr_type(double_type(), 1U), expr_type(double_type()));
add("gp_dot_prod_cov", expr_type(matrix_type()), expr_type(vector_type(), 1U), expr_type(vector_type(), 1U), expr_type(double_type()));
and then make the changes as in PR #980 (just copy and past the file).
I'll take your suggestions as to how to debug when I get the chance, much appreciated.
or likewise, just sum two gp_exp_quad_cov
kernels, and you'll get the same issue. I'll figure it out when I put some time into it.
I traced the memory leak this to the function: hmc_nuts_diag_e_adapt.hpp
where we have a recursive function, with no terminating condition, that is probably causing a stack overflow.
I'm taking a look at Hamiltonian Monte Carlo for Hierarchical Models and The No-U-Turn Sampler to see how the adaptive parameters are calculated, but these are likely outdated.
Can someone point me to a paper that describes the current state of the epsilon adaptation? (easier to read math and papers than chase the C++ around). I want to at least propose a solution, as it will help my understanding of HMC/NUTS.
An exploding recursion should be easy to verify before digging into the mechanics. Can you run this in a debugger (build everything with -Og), run it in gdb or lldb, and then check the backtrace?
I'll be in Helsinki tomorrow afternoon. We can look at this then if you've got other stuff to do.
has absolutely nothing to with what I said above, just went out of index on a vector... addressing this in #980, as it's taken care of by vectorizing error checks
Description
Whenever we try to sum kernels in a
gp_pred_rng
, we get a segmentation fault.I've tried only using the predictive mean, and still, we get a seg fault.
Example
Here's an example of the kind of gp_pred_rng that will give a seg fault. I'm also attaching full Stan code and datasets, where if we sum kernels in the
pred_rng
, we experience the same issue every time:Additional Information
There are no seg faults when summing kernels in the transformed parameters block when I take the kernels out of scope, only when I implement something which sums kernels for the posterior predictive distribution, in the
function
block.These datasets are relatively small in size, there should not be a segmentation fault in these cases.
gp_regression.txt housing.txt logistic_gp.txt heart-disease-uci.zip
Current Math Version
v2.18.0