pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.72k stars 2.02k forks source link

"TypeError: cannot pickle 'fortran' object" for sampling with multiple cores #5864

Closed mathtimm closed 1 year ago

mathtimm commented 2 years ago

I am getting a TypeError when I try to sample with multiple cores, the issue does not appear if cores=1. I am working on the modeling of exoplanet transits using the code exoplanet that makes use of PyMC3.

Complete error traceback ```python `--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Input In [10], in () 1 np.random.seed(42) 3 with model: ----> 4 trace = pm.sample( 5 tune=3000, 6 draws=4000, 7 start=opt, 8 cores=3, 9 chains=2, 10 init="adapt_full", 11 target_accept=0.9, 12 return_inferencedata=True 13 ) File /opt/homebrew/Caskroom/miniforge/base/envs/prose-env3/lib/python3.9/site-packages/pymc3/sampling.py:559, in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, jitter_max_retries, return_inferencedata, idata_kwargs, mp_ctx, pickle_backend, **kwargs) 557 _print_step_hierarchy(step) 558 try: --> 559 trace = _mp_sample(**sample_args, **parallel_args) 560 except pickle.PickleError: 561 _log.warning("Could not pickle model, sampling singlethreaded.") File /opt/homebrew/Caskroom/miniforge/base/envs/prose-env3/lib/python3.9/site-packages/pymc3/sampling.py:1461, in _mp_sample(draws, tune, step, chains, cores, chain, random_seed, start, progressbar, trace, model, callback, discard_tuned_samples, mp_ctx, pickle_backend, **kwargs) 1458 strace.setup(draws + tune, idx + chain) 1459 traces.append(strace) -> 1461 sampler = ps.ParallelSampler( 1462 draws, 1463 tune, 1464 chains, 1465 cores, 1466 random_seed, 1467 start, 1468 step, 1469 chain, 1470 progressbar, 1471 mp_ctx=mp_ctx, 1472 pickle_backend=pickle_backend, 1473 ) 1474 try: 1475 try: File /opt/homebrew/Caskroom/miniforge/base/envs/prose-env3/lib/python3.9/site-packages/pymc3/parallel_sampling.py:423, in ParallelSampler.__init__(self, draws, tune, chains, cores, seeds, start_points, step_method, start_chain_num, progressbar, mp_ctx, pickle_backend) 421 if mp_ctx.get_start_method() != "fork": 422 if pickle_backend == "pickle": --> 423 step_method_pickled = pickle.dumps(step_method, protocol=-1) 424 elif pickle_backend == "dill": 425 try: TypeError: cannot pickle 'fortran' object` ```

Could this be a version issue ? I am having trouble finding the right combination of pymc3, pymc3-ext and exoplanet.

Thanks a lot in advance !

Versions and main components

lucianopaz commented 2 years ago

By any chance, could you share your model code with us? We've had reports similar to this on Mac M1 chips, but haven't been able to reproduce the error.

mathtimm commented 2 years ago

Yes of course, here it is. It's for 13 distinct light curves obtained with various instruments and in multiple filters.

` with pm.Model() as model:

# Stellar parameters
# -----------------
r_s = pm.Normal('r_s',0.326568  ,0.00988657)
m_s = pm.Normal('m_s',0.30713 ,0.0202867)

# Orbital parameters
# -----------------
t0= pm.Normal('t0',2459384.5731,0.005)
p = pm.Normal('P',16.336545,0.001) 
b = pm.Uniform("b",0.,1.,testval=0.4)

# Keplerian orbit
# ---------------
orbit = xo.orbits.KeplerianOrbit(period=p, t0=t0, r_star=r_s, b=b, m_star=m_s)
pm.Deterministic("a",orbit.a)
pm.Deterministic ('i',orbit.incl * 180/np.pi)
pm.Deterministic('a/r_s', orbit.a / orbit.r_star)

depths =[]
names=[]

# Loop over the instruments
# -------------------------
#Loop over the filters
for i, (filt,ldc) in enumerate(filters.items()):

    # The limb darkening
    u = xo.QuadLimbDark(f"u_{filt}",testval=np.array(ldc))
    star = xo.LimbDarkLightCurve(u)

    # Radius 
    depth = pm.Uniform(f"depth_{filt}", 1e-3, 9e-3,testval=4e-3) 
    ror = pm.Deterministic(f"ror_{filt}",star.get_ror_from_approx_transit_depth(depth, b))
    r_p = pm.Deterministic(f"r_p_{filt}", ror * r_s) # In solar radius 
    r = pm.Deterministic(f'r_{filt}',r_p*1/earth2sun)

    #Loop over all the light curves
    for n, (name, (x, y, yerr, texp, X,f)) in enumerate(data[filt].items()):
        names.append(name)
        if filt == 'zp':
            alpha = pm.Normal(f'alpha_{name}',0.86,0.001)
            y_p = y*(1+alpha) - alpha
        else:
            y_p=y

        # starry light-curve
        light_curves = star.get_light_curve(orbit=orbit, r=r_p, t=x, texp=texp/24/60/60) 
        transit = pm.Deterministic(f"transit_{name}", pm.math.sum(light_curves, axis=-1))

        # Systematics and final model
        c=np.linalg.lstsq(X.T,y,rcond=None)[0]
        w = pm.Flat(f'w_{name}',shape=len(X),testval=np.array(c))
        systematics = pm.Deterministic(f'systematics_{name}',w@X.copy())
        mu = pm.Deterministic(f"mu_{name}", transit+systematics)

        # Likelihood function
        sigma = pm.Uniform(f"sigma_{name}",0.5,4,testval=f) # Adjustment of the errors
        pm.Normal(f"obs_{name}", mu=mu, sd=yerr*sigma, observed=y_p)

# Maximum a posteriori
# --------------------
opt = pmx.optimize(start=model.test_point)`
mathtimm commented 2 years ago

I created a new clean environment and installed exoplanet and exoplanet-ext with pip, the problem seems fixed now. The only difference in terms of version is that I am running the 3.11.5 of PyMC3, but I tried upgrading the package in the other environment and the bug still happens. I'm not sure what is happening but it might not be related to PyMC3 but rather to exoplanet and the compatibility with PyMC3 and Theano-PyMC ?

lucianopaz commented 2 years ago

From the other reports that we had, it seems related to BLAS, scipy and the cholesky decomposition. We haven’t identified the exact configuration options that trigger the error though, but we think that at its core, what is failing is related to this:

from scipy.linalg import lapack
import cloudpickle

cloudpickle.dumps(lapack.cpotrf)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [39], in <cell line: 5>()
      1 from scipy.linalg import lapack
      2 import cloudpickle
----> 5 cloudpickle.dumps(lapack.cpotrf)

File ~/anaconda3/envs/test_env/lib/python3.10/site-packages/cloudpickle/cloudpickle_fast.py:73, in dumps(obj, protocol, buffer_callback)
     69 with io.BytesIO() as file:
     70     cp = CloudPickler(
     71         file, protocol=protocol, buffer_callback=buffer_callback
     72     )
---> 73     cp.dump(obj)
     74     return file.getvalue()

File ~/anaconda3/envs/test_env/lib/python3.10/site-packages/cloudpickle/cloudpickle_fast.py:633, in CloudPickler.dump(self, obj)
    631 def dump(self, obj):
    632     try:
--> 633         return Pickler.dump(self, obj)
    634     except RuntimeError as e:
    635         if "recursion" in e.args[0]:

TypeError: cannot pickle 'fortran' object
lucianopaz commented 2 years ago

It looks like this problem might be solved once numpy/numpy#21767 is fixed.