Closed iancrossfield closed 3 years ago
You must be passing the starry.Map
instance as an argument to the log probability function. Currently, the Map
class is generated from a class factory, so it's not directly picklable. I think it would be straightforward to fix this, but I would recommend a different, likely much faster approach than what you're currently doing: switch to lazy
evaluation mode and explicitly compile the theano
function yourself. There's a section in the docs showing how to do this.
The reason it's faster is that you're compiling all the operations (instantiating the map, setting the attributes, computing the flux, computing the likelihood) into a single executable function. You're also baking in the dataset into the compiled routine, so there's much, much less data transfer across cores.
I usually recommend using pymc3
instead to do sampling, since that's what starry
is optimized for (and it can be significantly faster for many problems). But following the steps above should work for you. Let me know if you run into issues.
Thanks @rodluger . After wrestling with theano (entirely new to me!) for a while and rewriting my model function, I was able to get the function to successfully compile in theano. The new 'fast,' 'non-lazy' version runs successfully, but emcee still bombs out with a long error trace, ending in:
MemoryError: std::bad_alloc Apply node that caused the error: <starry._core.ops.filter.FOp object at 0x7ff4b2a6dfd0>(AdvancedIncSubtensor1{inplace,set}.0, IncSubtensor{InplaceSet;int64}.0) Toposort index: 52 Inputs types: [TensorType(float64, vector), TensorType(float64, vector)] Inputs shapes: [(3,), (16,)] Inputs strides: [(8,), (8,)] Inputs values: [array([-1. , 0.3409965 , 0.29015569]), 'not shown'] Outputs clients: [[SparseDot(SparseConstant{csc,float64,shape=(36, 36),nnz=124}, <starry._core.ops.filter.FOp object at 0x7ff4b2a6dfd0>.0), InplaceDimShuffle{1,0}(<starry._core.ops.filter.FOp object at 0x7ff4b2a6dfd0>.0)]]
Backtrace when the node is created(use Theano flag traceback__limit=N to make it longer): File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/utils.py", line 104, in wrapper return func(instance, args) File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/core.py", line 204, in flux return tt.dot(self.X(theta, xo, yo, zo, ro, inc, obl, u, f), y) File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/utils.py", line 104, in wrapper return func(instance, args) File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/core.py", line 175, in X F = self.F(u, f) File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/utils.py", line 104, in wrapper return func(instance, args) File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/core.py", line 137, in F return self._F(u, f) File "/home/ianc/anaconda3/lib/python3.7/site-packages/theano/graph/op.py", line 250, in call node = self.make_node(inputs, **kwargs) File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/ops/filter.py", line 25, in make_node outputs = [tt.TensorType(inputs[-1].dtype, (False, False))()]
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
System monitoring tools confirm that I'm nowhere near to running out of system memory, and searching online for "theano" "MemoryError: std::bad_alloc"
reveals zero results.
I don't think this is a problem with starry; more likely, it's probably some issue relating to the unholy fusion of theano with my byzantine, and rather antiquated, error functions (specifically, I'm using https://crossfield.ku.edu/python/phasecurves.html#phasecurves.lnprobfunc). I'll try to explore the PyMC3 options mentioned above, though even their 'Tutorials' seem rather opaque to this frequentist-at-heart astronomer.
Oh, quite the opposite -- I think that's a starry segfault. I believe you're instantiating the starry map like
map = starry.Map(udeg=2, rv=True)
If so, can you try doing
map = starry.Map(ydeg=1, udeg=2, rv=True)
and let me know if that fixes the issue? This could be an old bug popping up again when the degree of the Ylm expansion is zero.
Adding ydeg=1
in the initialization results in about the same error trace as I posted in my previous comment, above.
But I CAN manage to successfully run starry's RM calculation wrapped inside of emcee
IF I set pool=None
. So no multiprocessing, but at least it does run -- probably in less time than it would take me to learn PyMC3, too ;)
OK, I can confirm the issue is in starry
. Here's a MWE that triggers the error you're seeing:
import starry
import numpy as np
import emcee
from multiprocessing import Pool
import theano
import theano.tensor as tt
if __name__ == "__main__":
# Generate dataset
np.random.seed(0)
theta = np.linspace(0, 360, 100)
data = 0.1 * np.random.randn(len(theta))
def log_prob_(x):
map = starry.Map(rv=True)
map.inc = x[0]
map.veq = x[1]
model = map.rv(theta=theta)
return -0.5 * tt.sum((model - data) ** 2)
x = tt.dvector()
log_prob = theano.function([x], log_prob_(x))
nwalkers = 4
ndim = 2
niter = 100
guess = np.random.randn(nwalkers, ndim)
with Pool() as pool:
sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob, pool=pool)
sampler.run_mcmc(guess, niter, progress=True)
I'll look into it.
I can also confirm this seems to only happen during multiprocessing. Setting pool=None
fixes the issue. When compiling starry
in debug mode, I get the following Eigen
error somewhere within the call to computeF():
Assertion failed: (lhs.cols() == rhs.rows() && "invalid matrix product" && "if you wanted a coeff-wise or a dot product use the respective explicit functions"), function Product, file starry/_core/ops/lib/vendor/eigen_3.3.5/Eigen/src/Core/Product.h, line 97.
Looks like B.U1
is uninitialized when it gets used here. If I print its shape I get (145, 140669174152478)
, which is bonkers!
I figured it out. The issue was on this line. For some reason I had declared B
as a reference. Reverting that fixes the issue for multiprocessing.
@iancrossfield Feel free to re-open if you run into more trouble or have any other questions. The code is now working on master
, and I released a patch that should be available on PyPI
within the hour (v1.1.3
).
Describe the bug When I try to use starry in the MCMC sampler `emcee' (via either the Map object directly, or via the System object), emcee fails in both cases because the Map object cannot be pickled. Specifically: "AttributeError: Can't pickle local object 'Map..Map'"
To Reproduce
Results in the error listed above.
Expected behavior I expect to be able to successfully use starry Map or System objects in Bayesian inference calculations (specifically, MCMC; very specifically, via 'emcee').
Your setup (please complete the following information):