Segmentation Fault when extending HH-model to 7 parameters with the range in the paper

holmosaint commented 4 years ago

There is nothing wrong when only sampling two parameters. The error messages are as the following.

holmosaint commented 4 years ago

Run simulations (pilot run) : 1%|█▎ | 1000/100000 [03:31<36:18:43, 1.32s/it][e5:24277] Process received signal [e5:24277] Signal: Segmentation fault (11) [e5:24277] Signal code: Address not mapped (1) [e5:24277] Failing at address: 0x558799168f20 [e5:24277] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f76f9c08390] [e5:24277] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x81c74)[0x7f76f98aec74] [e5:24277] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7f76f98b1184] [e5:24277] [ 3] /home/skw/anaconda3/envs/py3.6/lib/python3.6/site-packages/numpy/core/_multiarray_umath.cpython-36m-x86_64-linux-gnu.so(+0x1850f2)[0x7f76f749f0f2] [e5:24277] [ 4] /home/skw/anaconda3/envs/py3.6/lib/python3.6/site-packages/numpy/core/_multiarray_umath.cpython-36m-x86_64-linux-gnu.so(+0x185b17)[0x7f76f749fb17] [e5:24277] [ 5] /home/skw/anaconda3/envs/py3.6/lib/python3.6/site-packages/numpy/core/_multiarray_umath.cpython-36m-x86_64-linux-gnu.so(+0x150982)[0x7f76f746a982] [e5:24277] [ 6] /home/skw/anaconda3/envs/py3.6/lib/python3.6/site-packages/numpy/core/_multiarray_umath.cpython-36m-x86_64-linux-gnu.so(+0x155a8f)[0x7f76f746fa8f] [e5:24277] [ 7] /home/skw/anaconda3/envs/py3.6/lib/python3.6/site-packages/numpy/core/_multiarray_umath.cpython-36m-x86_64-linux-gnu.so(+0x1568c1)[0x7f76f74708c1] [e5:24277] [ 8] python(_PyObject_FastCallDict+0x8b)[0x558796bfe92b] [e5:24277] [ 9] python(PyObject_CallFunctionObjArgs+0xed)[0x558796c1bded] [e5:24277] [10] python(PyNumber_InPlaceMultiply+0x53)[0x558796c4c823] [e5:24277] [11] python(_PyEval_EvalFrameDefault+0x3913)[0x558796cad993] [e5:24277] [12] python(+0x1918e4)[0x558796c7e8e4] [e5:24277] [13] python(+0x192771)[0x558796c7f771] [e5:24277] [14] python(+0x198505)[0x558796c85505] [e5:24277] [15] python(_PyEval_EvalFrameDefault+0x30a)[0x558796caa38a] [e5:24277] [16] python(+0x19253b)[0x558796c7f53b] [e5:24277] [17] python(+0x198505)[0x558796c85505] [e5:24277] [18] python(_PyEval_EvalFrameDefault+0x30a)[0x558796caa38a] [e5:24277] [19] python(+0x19253b)[0x558796c7f53b] [e5:24277] [20] python(+0x198505)[0x558796c85505] [e5:24277] [21] python(_PyEval_EvalFrameDefault+0x30a)[0x558796caa38a] [e5:24277] [22] python(+0x19253b)[0x558796c7f53b] [e5:24277] [23] python(+0x198505)[0x558796c85505] [e5:24277] [24] python(_PyEval_EvalFrameDefault+0x30a)[0x558796caa38a] [e5:24277] [25] python(+0x19253b)[0x558796c7f53b] [e5:24277] [26] python(+0x198505)[0x558796c85505] [e5:24277] [27] python(_PyEval_EvalFrameDefault+0x30a)[0x558796caa38a] [e5:24277] [28] python(+0x19253b)[0x558796c7f53b] [e5:24277] [29] python(+0x198505)[0x558796c85505] [e5:24277] End of error message

jan-matthis commented 4 years ago

Could you post code for a minimal example producing the error above?

holmosaint commented 4 years ago

Thanks for your reply. This error is quite occasional. Sometimes before the simulations even begin, while sometimes during the simulation. When it failed at the beginning. , it often caused a large number of child processes which were not collected.

It is the same code as the HH model tutorial on delfi, and I changed the number of parameters to 8 as the following:

    gbar_Na = params[0,0]  # mS/cm2
    gbar_Na.astype(float)
    gbar_K = params[0,1]   # mS/cm2
    gbar_K.astype(float)    
    gbar_leak =  params[0,2]    # mS/cm2
    gbar_leak.astype(float)
    gbar_M = params[0,3]   # mS/cm2
    gbar_M.astype(float)
    tau_max = params[0,4]   # ms
    tau_max.astype(float)
    Vt = params[0,6]       # mV
    Vt.astype(float)
    nois_fact = params[0,6] # uA/cm2
    nois_fact.astype(float)
    E_leak = params[0,7]   # mV
    E_leak.astype(float)

The range of the parameters are the same as those in your paper on bioaxiv. The true parameters are:

true_params = np.array([50., 1., 0.03, 0.03, 10, 40, 0.12, 50])

The training configuration is:

seed_inf = 1

pilot_samples = int(1e5)

# training schedule
n_train = 2000
n_rounds = 1

# fitting setup
minibatch = 100
epochs = 100
val_frac = 0.05

# network setup
n_hiddens = [50,50]

# convenience
prior_norm = True

# MAF parameters
density = 'maf'
n_mades = 5         # number of MADES

import delfi.inference as infer

# inference object
res = infer.SNPEC(g,
                obs=obs_stats,
                n_hiddens=n_hiddens,
                seed=seed_inf,
                pilot_samples=pilot_samples,
                n_mades=n_mades,
                prior_norm=prior_norm,
                density=density)

# train
log, _, posterior = res.run(
                    n_train=n_train,
                    n_rounds=n_rounds,
                    minibatch=minibatch,
                    epochs=epochs,
                    silent_fail=False,
                    proposal='prior',
                    val_frac=val_frac,
                    verbose=True,)

jan-matthis commented 4 years ago

Can you post a complete script, i.e., include how you set up the generator?

From your error and what you write it sounds like a problem with MPGenerator. Does using the default generator work without problems?

holmosaint commented 4 years ago

Up till now, I didn't meet this problem with a single default generator, but the pilot samples are much fewer than 1e5 in this case, or it'll cost lots of time.

The script is as the following:

#!/usr/bin/env python
# coding: utf-8

# # Simulation and density estimation of HH model

# In[2]:

import os
os.environ["MKL_THREADING_LAYER"] = "GNU"

import numpy as np

def syn_current(duration=120, dt=0.01, t_on = 10,
                curr_level = 5e-4, seed=None):
    t_offset = 0.
    duration = duration
    t_off = duration - t_on
    t = np.arange(0, duration+dt, dt)

    # external current
    A_soma = np.pi*((70.*1e-4)**2)  # cm2
    I = np.zeros_like(t)
    I[int(np.round(t_on/dt)):int(np.round(t_off/dt))] = curr_level/A_soma # muA/cm2

    return I, t_on, t_off, dt, t, A_soma

# In[3]:

def HHsimulator(V0, params, dt, t, I, seed=None):
    """Simulates the Hodgkin-Huxley model for a specified time duration and current

        Parameters
        ---------- 
        V0 : float
            Voltage at first time step
        params : np.array, 1d of length dim_param
            Parameter vector
        dt : float
            Timestep
        t : array
            Numpy array with the time steps
        I : array
            Numpy array with the input current
        seed : int
        """

    gbar_Na = params[0,0]  # mS/cm2
    gbar_Na.astype(float)
    gbar_K = params[0,1]   # mS/cm2
    gbar_K.astype(float)    
    gbar_leak =  params[0,2]    # mS/cm2
    gbar_leak.astype(float)
    gbar_M = params[0,3]   # mS/cm2
    gbar_M.astype(float)
    tau_max = params[0,4]   # ms
    tau_max.astype(float)
    Vt = params[0,6]       # mV
    Vt.astype(float)
    nois_fact = params[0,6] # uA/cm2
    nois_fact.astype(float)
    E_leak = params[0,7]   # mV
    E_leak.astype(float)

    # fixed parameters
    C = 1.          # uF/cm2
    E_Na = 53       # mV
    E_K = -107      # mV

    tstep = float(dt)

    if seed is not None:
        rng = np.random.RandomState(seed=seed)
    else:
        rng = np.random.RandomState()

    ####################################
    # kinetics
    def efun(z):
        if np.abs(z) < 1e-4:
            return 1 - z/2
        else:
            return z / (np.exp(z) - 1)

    def alpha_m(x):
        v1 = x - Vt - 13.
        return 0.32*efun(-0.25*v1)/0.25

    def beta_m(x):
        v1 = x - Vt - 40
        return 0.28*efun(0.2*v1)/0.2

    def alpha_h(x):
        v1 = x - Vt - 17.
        return 0.128*np.exp(-v1/18.)

    def beta_h(x):
        v1 = x - Vt - 40.
        return 4.0/(1 + np.exp(-0.2*v1))

    def alpha_n(x):
        v1 = x - Vt - 15.
        return 0.032*efun(-0.2*v1)/0.2

    def beta_n(x):
        v1 = x - Vt - 10.
        return 0.5*np.exp(-v1/40)

    # steady-states and time constants
    def tau_n(x):
         return 1/(alpha_n(x) + beta_n(x))
    def n_inf(x):
        return alpha_n(x)/(alpha_n(x) + beta_n(x))
    def tau_m(x):
        return 1/(alpha_m(x) + beta_m(x))
    def m_inf(x):
        return alpha_m(x)/(alpha_m(x) + beta_m(x))
    def tau_h(x):
        return 1/(alpha_h(x) + beta_h(x))
    def h_inf(x):
        return alpha_h(x)/(alpha_h(x) + beta_h(x))

    # slow non-inactivating K+
    def p_inf(x):
        v1 = x + 35.
        return 1.0/(1. + np.exp(-0.1*v1))

    def tau_p(x):
        v1 = x + 35.
        return tau_max/(3.3*np.exp(0.05*v1) + np.exp(-0.05*v1))

    ####################################
    # simulation from initial point
    V = np.zeros_like(t) # voltage
    n = np.zeros_like(t)
    m = np.zeros_like(t)
    h = np.zeros_like(t)
    p = np.zeros_like(t)

    V[0] = float(V0)
    n[0] = n_inf(V[0])
    m[0] = m_inf(V[0])
    h[0] = h_inf(V[0])
    p[0] = p_inf(V[0])

    for i in range(1, t.shape[0]):
        tau_V_inv = ( (m[i-1]**3)*gbar_Na*h[i-1]+(n[i-1]**4)*gbar_K+gbar_leak+gbar_M*p[i-1] )/C
        V_inf = ( (m[i-1]**3)*gbar_Na*h[i-1]*E_Na+(n[i-1]**4)*gbar_K*E_K+gbar_leak*E_leak+gbar_M*p[i-1]*E_K
                +I[i-1]+nois_fact*rng.randn()/(tstep**0.5) )/(tau_V_inv*C)
        V[i] = V_inf + (V[i-1]-V_inf)*np.exp(-tstep*tau_V_inv)
        n[i] = n_inf(V[i])+(n[i-1]-n_inf(V[i]))*np.exp(-tstep/tau_n(V[i]))
        m[i] = m_inf(V[i])+(m[i-1]-m_inf(V[i]))*np.exp(-tstep/tau_m(V[i]))
        h[i] = h_inf(V[i])+(h[i-1]-h_inf(V[i]))*np.exp(-tstep/tau_h(V[i]))
        p[i] = p_inf(V[i])+(p[i-1]-p_inf(V[i]))*np.exp(-tstep/tau_p(V[i]))

    return np.array(V).reshape(-1,1)

# In[30]:

import matplotlib as mpl
mpl.use('Agg')
import matplotlib.pyplot as plt

# input current, time step, time array
I, t_on, t_off, dt, t, A_soma = syn_current()

# In[5]:

from delfi.simulator.BaseSimulator import BaseSimulator

class HodgkinHuxley(BaseSimulator):
    def __init__(self, I, dt, V0, dim_param, seed=None):
        """Hodgkin-Huxley simulator

        Parameters
        ----------
        I : array
            Numpy array with the input current
        dt : float
            Timestep
        V0 : float
            Voltage at first time step
        seed : int or None
            If set, randomness across runs is disabled
        """

        super().__init__(dim_param=dim_param, seed=seed)
        self.I = I
        self.dt = dt
        self.t = np.arange(0, len(self.I), 1)*self.dt
        self.HHsimulator = HHsimulator
        self.init = V0

    def gen_single(self, params):
        """Forward model for simulator for single parameter set

        Parameters
        ----------
        params : list or np.array, 1d of length dim_param
            Parameter vector

        Returns
        -------
        dict : dictionary with data
            The dictionary must contain a key data that contains the results of
            the forward run. Additional entries can be present.
        """
        params = np.asarray(params)

        assert params.ndim == 1, 'params.ndim must be 1'

        hh_seed = self.gen_newseed()

        states = self.HHsimulator(self.init, params.reshape(1, -1), self.dt, self.t, self.I, seed=hh_seed)

        return {'data': states.reshape(-1),
                'time': self.t,
                'dt': self.dt,
                'I': self.I.reshape(-1)}

# In[6]:

import delfi.distribution as dd

seed_p = 2
prior_min = np.array([0.5, 1e-4, 1e-4, 1e-4, 50, 40, 1e-4, 35])
prior_max = np.array([80., 15, 0.6, 0.6, 3000, 90., 0.15, 100])
prior = dd.Uniform(lower=prior_min, upper=prior_max,seed=seed_p)

# We want to fit a set of summary statistics of the observed data: number of spikes, mean resting potential, standard deviation of the resting potential, and the first 4 voltage moments, mean, standard deviation, skewness and kurtosis. In order to do accomplish that, we define a summary statistics class which computes those quantities:

# In[7]:

from delfi.summarystats.BaseSummaryStats import BaseSummaryStats
from scipy import stats as spstats

class HodgkinHuxleyStats(BaseSummaryStats):
    """Moment based SummaryStats class for the Hodgkin-Huxley model

    Calculates summary statistics
    """
    def __init__(self, t_on, t_off, n_mom=4, n_summary=7, seed=None):
        """See SummaryStats.py for docstring"""
        super(HodgkinHuxleyStats, self).__init__(seed=seed)
        self.t_on = t_on
        self.t_off = t_off
        self.n_mom = n_mom
        self.n_summary = np.minimum(n_summary,n_mom + 3)

    def calc(self, repetition_list):
        """Calculate summary statistics

        Parameters
        ----------
        repetition_list : list of dictionaries, one per repetition
            data list, returned by `gen` method of Simulator instance

        Returns
        -------
        np.array, 2d with n_reps x n_summary
        """
        stats = []
        for r in range(len(repetition_list)):
            x = repetition_list[r]

            N = x['data'].shape[0]
            t = x['time']
            dt = x['dt']
            t_on = self.t_on
            t_off = self.t_off

            # initialise array of spike counts
            v = np.array(x['data'])

            # put everything to -10 that is below -10 or has negative slope
            ind = np.where(v < -10)
            v[ind] = -10
            ind = np.where(np.diff(v) < 0)
            v[ind] = -10

            # remaining negative slopes are at spike peaks
            ind = np.where(np.diff(v) < 0)
            spike_times = np.array(t)[ind]
            spike_times_stim = spike_times[(spike_times > t_on) & (spike_times < t_off)]

            # number of spikes
            if spike_times_stim.shape[0] > 0:
                spike_times_stim = spike_times_stim[np.append(1, np.diff(spike_times_stim))>0.5]

            # resting potential and std
            rest_pot = np.mean(x['data'][t<t_on])
            rest_pot_std = np.std(x['data'][int(.9*t_on/dt):int(t_on/dt)])

            # moments
            std_pw = np.power(np.std(x['data'][(t > t_on) & (t < t_off)]),
                              np.linspace(3,self.n_mom,self.n_mom-2))
            std_pw = np.concatenate((np.ones(1),std_pw))
            moments = spstats.moment(x['data'][(t > t_on) & (t < t_off)],
                                     np.linspace(2,self.n_mom,self.n_mom-1))/std_pw

            # concatenation of summary statistics
            sum_stats_vec = np.concatenate((
                    np.array([spike_times_stim.shape[0]]),
                    np.array([rest_pot,rest_pot_std,np.mean(x['data'][(t > t_on) & (t < t_off)])]),
                    moments
                ))
            sum_stats_vec = sum_stats_vec[0:self.n_summary]

            stats.append(sum_stats_vec)

        return np.asarray(stats)

# In[9]:

import delfi.generator as dg

# input current, time step
I, t_on, t_off, dt, t, A_soma = syn_current()

# initial voltage
V0 = -70

# parameters dimension
dim_param = 11

# seeds
seed_m = 1

# summary statistics hyperparameters
n_mom = 4
n_summary = 7

s = HodgkinHuxleyStats(t_on=t_on, t_off=t_off, n_mom=n_mom, n_summary=n_summary)

# In[11]:

n_processes = 15

seeds_m = np.arange(1, n_processes+1, 1)
m = []
for i in range(n_processes):
    m.append(HodgkinHuxley(I, dt, V0=V0, dim_param=dim_param, seed=seeds_m[i]))
g = dg.MPGenerator(models=m, prior=prior, summary=s)

# In[25]:

# true parameters and respective labels
true_params = np.array([50., 1., 0.03, 0.03, 10, 40, 0.12, 50])       
labels_params = [r'$g_{Na}$', r'$g_{K}$', r'$g_{L}$', r'$g_{M}$', r'$\tau_{max}$', r'$V_{T}$', r'$\sigma$', r'$E_{L}$']

# observed data: simulation given true parameters
obs = m[0].gen_single(true_params)

# In[16]:

obs_stats = s.calc([obs])

# In[17]:

seed_inf = 1

pilot_samples = int(1e5)

# training schedule
n_train = 2000
n_rounds = 1

# fitting setup
minibatch = 100
epochs = 100
val_frac = 0.05

# network setup
n_hiddens = [50,50]

# convenience
prior_norm = True

# MAF parameters
density = 'maf'
n_mades = 5         # number of MADES

# In[18]:

import delfi.inference as infer

# inference object
res = infer.SNPEC(g,
                obs=obs_stats,
                n_hiddens=n_hiddens,
                seed=seed_inf,
                pilot_samples=pilot_samples,
                n_mades=n_mades,
                prior_norm=prior_norm,
                density=density)

# In[19]:

# train
log, _, posterior = res.run(
                    n_train=n_train,
                    n_rounds=n_rounds,
                    minibatch=minibatch,
                    epochs=epochs,
                    silent_fail=False,
                    proposal='prior',
                    val_frac=val_frac,
                    verbose=True,)

# In[21]:
base_dir = 'HH Result'

fig = plt.figure(figsize=(15,5))

plt.plot(log[0]['loss'],lw=2)
plt.xlabel('iteration')
plt.ylabel('loss');

plt.savefig(os.path.join(base_dir, 'loss.png'), dpi=400)

# In[24]:

from delfi.utils.viz import samples_nd

prior_min = g.prior.lower
prior_max = g.prior.upper
prior_lims = np.concatenate((prior_min.reshape(-1,1),prior_max.reshape(-1,1)),axis=1)

posterior_samples = posterior[0].gen(10000)

###################
# colors
hex2rgb = lambda h: tuple(int(h[i:i+2], 16) for i in (0, 2, 4))

# RGB colors in [0, 255]
col = {}
col['GT']      = hex2rgb('30C05D')
col['SNPE']    = hex2rgb('2E7FE8')
col['SAMPLE1'] = hex2rgb('8D62BC')
col['SAMPLE2'] = hex2rgb('AF99EF')

# convert to RGB colors in [0, 1]
for k, v in col.items():
    col[k] = tuple([i/255 for i in v])

###################
# posterior
fig, axes = samples_nd(posterior_samples,
                       limits=prior_lims,
                       ticks=prior_lims,
                       labels=labels_params,
                       fig_size=(5,5),
                       diag='kde',
                       upper='kde',
                       hist_diag={'bins': 50},
                       hist_offdiag={'bins': 50},
                       kde_diag={'bins': 50, 'color': col['SNPE']},
                       kde_offdiag={'bins': 50},
                       points=[true_params],
                       points_offdiag={'markersize': 5},
                       points_colors=[col['GT']],
                       title='');

plt.savefig(os.path.join(base_dir, 'posterior.png'), dpi=400)

# In[28]:

fig = plt.figure(figsize=(7,5))

y_obs = obs['data']
t = obs['time']
duration = np.max(t)

num_samp = 200

# sample from posterior
x_samp = posterior[0].gen(n_samples=num_samp)

# reject samples for which prior is zero
ind = (x_samp > prior_min) & (x_samp < prior_max)
params = x_samp[np.prod(ind,axis=1)==1]

num_samp = min(2, len(params[:,0]))

# simulate and plot samples
V = np.zeros((len(t),num_samp))
for i in range(num_samp):
    x = m[0].gen_single(params[i,:])
    V[:,i] = x['data']
    plt.plot(t, V[:, i], color = col['SAMPLE'+str(i+1)], lw=2, label='sample '+str(num_samp-i))

# plot observation
plt.plot(t, y_obs, '--',lw=2, label='observation')
plt.xlabel('time (ms)')
plt.ylabel('voltage (mV)')

ax = plt.gca()
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[::-1], labels[::-1], bbox_to_anchor=(1.3, 1), loc='upper right')

ax.set_xticks([0, duration/2, duration])
ax.set_yticks([-80, -20, 40]);

plt.savefig(os.path.join(base_dir, 'result.png'), dpi=400)

jan-matthis commented 4 years ago

Okay, so we narrowed down the problem to the multiprocessing generator then.

MPGenerator uses Python's multiprocessing module to create parallel threads for simulations: https://github.com/mackelab/delfi/blob/master/delfi/generator/MPGenerator.py

When you set up MPGenerator, you create 15 processes. How many CPU cores does your machine have? Would it work if you used a smaller amount of processes, say just 2?

holmosaint commented 4 years ago

There are 56 cores, and sometimes I can finish the whole process without any bugs with 15 processes. But this error occurs somehow random.

At the beggining the program, it will send out a warning (sorry for not mentioning ahead) as the following:

An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

  Local host:          e2 (PID 21477)
  MPI_COMM_WORLD rank: 0

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.

jan-matthis commented 4 years ago

It's tricky to debug this, especially when it's not occurring every time.

Here are some general hints for debugging multiprocessing problems: https://softwareengineering.stackexchange.com/questions/126940/debug-multiprocessing-in-python

Especially the comments regarding logging might be helpful.

CC'ing @kaandocal, @dgreenberg, @ppjgoncalves, who might have additional comments.

holmosaint commented 4 years ago

Yes, indeed. I'll try to see if it is somehow the problem of my system's configuration.

Does the warning of the 'fork()' stuff occur on your system?

kaandocal commented 4 years ago

Based on the warning it seems to me that the MPI setup is not compatible with fork(), ie. creating subprocesses on the fly. The MPGenerator class does exactly that, which may be what causes the crashes. (For reference, the warning does not occur on my system)

From the error message it appears that the program crashes while a numpy function is being called, but it would be helpful to get a more detailed traceback. Can you try running Python using the -X faulthandler command line option? Alternatively add the following to the code itself:

import faulthandler faulthandler.enable()

This should give you a Python traceback when the segmentation fault occurs. Does the problem always occur during the same function call? If so, trying to rewrite that function call may help. Otherwise this is likely an issue with MPI/multiprocessing/combining the two.

jan-matthis commented 4 years ago

Closing due to inactivity but feel free to re-open in case you manage to collect more debug logs.

mackelab / delfi

Segmentation Fault when extending HH-model to 7 parameters with the range in the paper #65