Cannot pickle Map or System object (incompatibility with emcee)

iancrossfield commented 3 years ago

Describe the bug When I try to use starry in the MCMC sampler `emcee' (via either the Map object directly, or via the System object), emcee fails in both cases because the Map object cannot be pickled. Specifically: "AttributeError: Can't pickle local object 'Map..Map'"

To Reproduce

import starry, pickle map = starry.Map(udeg=2, rv=True)
f = open('spam.pickle', 'w') pickle.dump(map, f)

Results in the error listed above.

Expected behavior I expect to be able to successfully use starry Map or System objects in Bayesian inference calculations (specifically, MCMC; very specifically, via 'emcee').

Your setup (please complete the following information):

Version of starry: 1.1.2
Operating system: Ubuntu 20.04
Python version & installation method (pip, conda,etc.): Python 3.7.6 , installed via pip

rodluger commented 3 years ago

You must be passing the starry.Map instance as an argument to the log probability function. Currently, the Map class is generated from a class factory, so it's not directly picklable. I think it would be straightforward to fix this, but I would recommend a different, likely much faster approach than what you're currently doing: switch to lazy evaluation mode and explicitly compile the theano function yourself. There's a section in the docs showing how to do this.

The reason it's faster is that you're compiling all the operations (instantiating the map, setting the attributes, computing the flux, computing the likelihood) into a single executable function. You're also baking in the dataset into the compiled routine, so there's much, much less data transfer across cores.

I usually recommend using pymc3 instead to do sampling, since that's what starry is optimized for (and it can be significantly faster for many problems). But following the steps above should work for you. Let me know if you run into issues.

iancrossfield commented 3 years ago

Thanks @rodluger . After wrestling with theano (entirely new to me!) for a while and rewriting my model function, I was able to get the function to successfully compile in theano. The new 'fast,' 'non-lazy' version runs successfully, but emcee still bombs out with a long error trace, ending in:

MemoryError: std::bad_alloc Apply node that caused the error: <starry._core.ops.filter.FOp object at 0x7ff4b2a6dfd0>(AdvancedIncSubtensor1{inplace,set}.0, IncSubtensor{InplaceSet;int64}.0) Toposort index: 52 Inputs types: [TensorType(float64, vector), TensorType(float64, vector)] Inputs shapes: [(3,), (16,)] Inputs strides: [(8,), (8,)] Inputs values: [array([-1. , 0.3409965 , 0.29015569]), 'not shown'] Outputs clients: [[SparseDot(SparseConstant{csc,float64,shape=(36, 36),nnz=124}, <starry._core.ops.filter.FOp object at 0x7ff4b2a6dfd0>.0), InplaceDimShuffle{1,0}(<starry._core.ops.filter.FOp object at 0x7ff4b2a6dfd0>.0)]]

Backtrace when the node is created(use Theano flag traceback__limit=N to make it longer): File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/utils.py", line 104, in wrapper return func(instance, args) File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/core.py", line 204, in flux return tt.dot(self.X(theta, xo, yo, zo, ro, inc, obl, u, f), y) File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/utils.py", line 104, in wrapper return func(instance, args) File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/core.py", line 175, in X F = self.F(u, f) File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/utils.py", line 104, in wrapper return func(instance, args) File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/core.py", line 137, in F return self._F(u, f) File "/home/ianc/anaconda3/lib/python3.7/site-packages/theano/graph/op.py", line 250, in call node = self.make_node(inputs, **kwargs) File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/ops/filter.py", line 25, in make_node outputs = [tt.TensorType(inputs[-1].dtype, (False, False))()]

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

System monitoring tools confirm that I'm nowhere near to running out of system memory, and searching online for "theano" "MemoryError: std::bad_alloc" reveals zero results.

I don't think this is a problem with starry; more likely, it's probably some issue relating to the unholy fusion of theano with my byzantine, and rather antiquated, error functions (specifically, I'm using https://crossfield.ku.edu/python/phasecurves.html#phasecurves.lnprobfunc). I'll try to explore the PyMC3 options mentioned above, though even their 'Tutorials' seem rather opaque to this frequentist-at-heart astronomer.

rodluger commented 3 years ago

Oh, quite the opposite -- I think that's a starry segfault. I believe you're instantiating the starry map like

map = starry.Map(udeg=2, rv=True)

If so, can you try doing

map = starry.Map(ydeg=1, udeg=2, rv=True)

and let me know if that fixes the issue? This could be an old bug popping up again when the degree of the Ylm expansion is zero.

iancrossfield commented 3 years ago

Adding ydeg=1 in the initialization results in about the same error trace as I posted in my previous comment, above.

But I CAN manage to successfully run starry's RM calculation wrapped inside of emcee IF I set pool=None. So no multiprocessing, but at least it does run -- probably in less time than it would take me to learn PyMC3, too ;)

rodluger commented 3 years ago

OK, I can confirm the issue is in starry. Here's a MWE that triggers the error you're seeing:

import starry
import numpy as np
import emcee
from multiprocessing import Pool
import theano
import theano.tensor as tt

if __name__ == "__main__":

    # Generate dataset
    np.random.seed(0)
    theta = np.linspace(0, 360, 100)
    data = 0.1 * np.random.randn(len(theta))

    def log_prob_(x):
        map = starry.Map(rv=True)
        map.inc = x[0]
        map.veq = x[1]
        model = map.rv(theta=theta)
        return -0.5 * tt.sum((model - data) ** 2)

    x = tt.dvector()
    log_prob = theano.function([x], log_prob_(x))

    nwalkers = 4
    ndim = 2
    niter = 100
    guess = np.random.randn(nwalkers, ndim)

    with Pool() as pool:
        sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob, pool=pool)
        sampler.run_mcmc(guess, niter, progress=True)

I'll look into it.

rodluger commented 3 years ago

I can also confirm this seems to only happen during multiprocessing. Setting pool=None fixes the issue. When compiling starry in debug mode, I get the following Eigen error somewhere within the call to computeF():

Assertion failed: (lhs.cols() == rhs.rows() && "invalid matrix product" && "if you wanted a coeff-wise or a dot product use the respective explicit functions"), function Product, file starry/_core/ops/lib/vendor/eigen_3.3.5/Eigen/src/Core/Product.h, line 97.

Looks like B.U1 is uninitialized when it gets used here. If I print its shape I get (145, 140669174152478), which is bonkers!

rodluger commented 3 years ago

I figured it out. The issue was on this line. For some reason I had declared B as a reference. Reverting that fixes the issue for multiprocessing.

rodluger commented 3 years ago

@iancrossfield Feel free to re-open if you run into more trouble or have any other questions. The code is now working on master, and I released a patch that should be available on PyPI within the hour (v1.1.3).

rodluger / starry

Cannot pickle Map or System object (incompatibility with emcee) #292