stan-dev / pystan2

PyStan, the Python interface to Stan
GNU General Public License v3.0
921 stars 191 forks source link

Segfault in simple probit model #754

Closed samvoisin closed 3 years ago

samvoisin commented 3 years ago

I am doing Probit regression in PyStan2. I have run several models including a Bayesian linear regression model without issue. However, when I try to run the code below I get a seg fault error:

Interestingly, this only occurs after the warm-up iterations have completed. My model code is here. I am currently making up some data and trying to infer the parameters beta_tru_1 and alpha. The model code comes from the Stan User's Guide.

import pystan
import numpy as np
from scipy.stats import norm

# the params
beta_tru_1 = 3.7
alpha = 2.3

# make some data
n = 1000
np.random.seed(1)
x1 = norm(0, 1).rvs(n)
z = alpha + x1 * beta_tru_1
y = [1 if i > 0.7 else 0 for i in norm.cdf(z)]

# train test split
y_train, y_test = y[:750], y[750:]
x_train, x_test = x1[:750], x1[750:]

# stan code
probit_code = """
data {
    int<lower=0> n; // number of data vectors
    real x[n]; // data matrix
    int<lower=0,upper=1> y[n]; // response vector
}
parameters {
    real beta; // regression coefs
    real alpha;
}
model {
    for (i in 1:n)
      y[i] ~ bernoulli(Phi(alpha + beta * x[i]));
}
"""

# compile the model
probit_model = pystan.StanModel(model_code=probit_code)

# the data
probit_dat = {
    "n": len(y_train),
    "y": y_train,
    "x": x_train
}

# fit the model (small number of iterations for debug)
# this is where the error is
probit_fit = probit_model.sampling(data=probit_dat, iter=500, warmup=500, chains=4, init="0")

I ran the code with strace -vd and got the output below. The first line here is the last statement written to stdout when the warm-up iterations are complete.

write(1, " Elapsed Time: 3.02104 seconds ("..., 41 Elapsed Time: 3.02104 seconds (Warm-up)
strace: [wait(0x00857f) = 286004] WIFSTOPPED,sig=133
strace: next_event: queued pid 286004
strace: next_event: dequeued pid 286004
) = 41
strace: [wait(0x00857f) = 286004] WIFSTOPPED,sig=133
strace: next_event: queued pid 286004
strace: next_event: dequeued pid 286004
write(1, "               9e-06 seconds (Sa"..., 40               9e-06 seconds (Sampling)
strace: [wait(0x00857f) = 286004] WIFSTOPPED,sig=133
strace: next_event: queued pid 286004
strace: next_event: dequeued pid 286004
) = 40
strace: [wait(0x00857f) = 286004] WIFSTOPPED,sig=133
strace: next_event: queued pid 286004
strace: next_event: dequeued pid 286004
write(1, "               3.02105 seconds ("..., 39               3.02105 seconds (Total)
strace: [wait(0x00857f) = 286004] WIFSTOPPED,sig=133
strace: next_event: queued pid 286004
strace: next_event: dequeued pid 286004
) = 39
strace: [wait(0x00857f) = 286004] WIFSTOPPED,sig=133
strace: next_event: queued pid 286004
strace: next_event: dequeued pid 286004
write(1, "\n", 1
strace: [wait(0x00857f) = 286004] WIFSTOPPED,sig=133
strace: next_event: queued pid 286004
strace: next_event: dequeued pid 286004
)                       = 1
strace: [wait(0x00857f) = 286004] WIFSTOPPED,sig=133
strace: next_event: queued pid 286004
strace: next_event: dequeued pid 286004
fstat(3, strace: [wait(0x00857f) = 286004] WIFSTOPPED,sig=133
strace: next_event: queued pid 286004
strace: next_event: dequeued pid 286004
{st_dev=makedev(0, 0x5), st_ino=11, st_mode=S_IFCHR|0666, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_rdev=makedev(0x1, 0x9), st_atime=1613135331 /* 2021-02-12T08:08:51.295345943-0500 */, st_atime_nsec=295345943, st_mtime=1613135331 /* 2021-02-12T08:08:51.295345943-0500 */, st_mtime_nsec=295345943, st_ctime=1613135331 /* 2021-02-12T08:08:51.295345943-0500 */, st_ctime_nsec=295345943}) = 0
strace: [wait(0x00857f) = 286004] WIFSTOPPED,sig=133
strace: next_event: queued pid 286004
strace: next_event: dequeued pid 286004
read(3, strace: [wait(0x00857f) = 286004] WIFSTOPPED,sig=133
strace: next_event: queued pid 286004
strace: next_event: dequeued pid 286004
"t_\244'\215}\264\377\31\26\211G\334\301\262\242", 16) = 16
strace: [wait(0x000b7f) = 286004] WIFSTOPPED,sig=SIGSEGV
strace: next_event: queued pid 286004
strace: next_event: dequeued pid 286004
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
strace: [wait(0x06057f) = 286004] WIFSTOPPED,sig=SIGTRAP,EVENT_EXIT (6)
strace: next_event: queued pid 286004
strace: next_event: dequeued pid 286004
strace: [wait(0x00008b) = 286004] WIFSIGNALED,core,sig=SIGSEGV
strace: next_event: queued pid 286004
strace: next_event: dequeued pid 286004
+++ killed by SIGSEGV (core dumped) +++
strace: dropped tcb for pid 286004, 0 remain
Segmentation fault (core dumped)

I am using PyStan v. 2.19.1.1 and python 3.7.6 on Linux Pop OS 20.10. I have run this code on multiple machines including an Ubuntu container with no luck. Is this a know Stan issue? Am I doing something obviously wrong in my code? Any help is appreciated.

riddell-stan commented 3 years ago

Transferring this to the pystan2 repo.

This doesn't look like anything I've seen before. Do you have enough memory?

Do you want to try PyStan 3? Since you're on Linux things should work fine.

ahartikainen commented 3 years ago

iter == warmup -> there are no post-warmup samples (so it fails then)

Use iter=1000 to get 500 post-warmup samples.

samvoisin commented 3 years ago

Wow that worked. Thank you @ahartikainen!

Also thank you to @riddell-stan for putting this issue in the right place. I will start working with pystan3 over the weekend. Looking forward to trying it out.