lesgourg / class_public

Public repository of the Cosmic Linear Anisotropy Solving System (master for the most recent version of the standard code; GW_CLASS to include Cosmic Gravitational Wave Background anisotropies; classnet branch for acceleration with neutral networks; ExoCLASS branch for exotic energy injection; class_matter branch for FFTlog)
235 stars 291 forks source link

MPI parallelization and cython conflict #567

Open marcobonici opened 10 months ago

marcobonici commented 10 months ago

I am trying to run some chains in parallel on a LSF cluster but I get the following error message

Initialising ensemble of 128 walkers...
Initialising ensemble of 128 walkers...
Traceback (most recent call last):
  File "/home/mbonici/zeus_chains.py", line 102, in <module>
    sampler.run_mcmc(pos, steps)
  File "/home/mbonici/miniconda3/lib/python3.9/site-packages/zeus/ensemble.py", line 419, in run_mcmc
Traceback (most recent call last):
  File "/home/mbonici/zeus_chains.py", line 102, in <module>
    for _ in self.sample(start,
  File "/home/mbonici/miniconda3/lib/python3.9/site-packages/zeus/ensemble.py", line 482, in sample
    sampler.run_mcmc(pos, steps)
  File "/home/mbonici/miniconda3/lib/python3.9/site-packages/zeus/ensemble.py", line 419, in run_mcmc
    Z, blobs = self.compute_log_prob(X)
  File "/home/mbonici/miniconda3/lib/python3.9/site-packages/zeus/ensemble.py", line 355, in compute_log_prob
    for _ in self.sample(start,
  File "/home/mbonici/miniconda3/lib/python3.9/site-packages/zeus/ensemble.py", line 482, in sample
    results = list(self.distribute(self.logprob_fn, (p[i] for i in range(len(p)))))
  File "/home/mbonici/miniconda3/lib/python3.9/site-packages/zeus/parallel.py", line 119, in map
    Z, blobs = self.compute_log_prob(X)
  File "/home/mbonici/miniconda3/lib/python3.9/site-packages/zeus/ensemble.py", line 355, in compute_log_prob
    results = list(self.distribute(self.logprob_fn, (p[i] for i in range(len(p)))))
  File "/home/mbonici/miniconda3/lib/python3.9/site-packages/zeus/parallel.py", line 119, in map
    self.comm.send(task, dest=worker, tag=taskid)
  File "mpi4py/MPI/Comm.pyx", line 1406, in mpi4py.MPI.Comm.send
    self.comm.send(task, dest=worker, tag=taskid)
  File "mpi4py/MPI/Comm.pyx", line 1406, in mpi4py.MPI.Comm.send
  File "mpi4py/MPI/msgpickle.pxi", line 211, in mpi4py.MPI.PyMPI_send
  File "mpi4py/MPI/msgpickle.pxi", line 211, in mpi4py.MPI.PyMPI_send
  File "mpi4py/MPI/msgpickle.pxi", line 145, in mpi4py.MPI.pickle_dump
  File "mpi4py/MPI/msgpickle.pxi", line 145, in mpi4py.MPI.pickle_dump
  File "mpi4py/MPI/msgpickle.pxi", line 131, in mpi4py.MPI.cdumps
  File "mpi4py/MPI/msgpickle.pxi", line 131, in mpi4py.MPI.cdumps
  File "stringsource", line 2, in classy.Class.__reduce_cython__
  File "stringsource", line 2, in classy.Class.__reduce_cython__
TypeError: no default _reduce_ due to non-trivial _cinit_
TypeError: no default _reduce_ due to non-trivial _cinit_

The relevant lines are the last ones: there is no default _reduce_ due to non-trivial _cinit_.

schoeneberg commented 9 months ago

Dear @marcobonici , this is related to CLASS currently not being pickle-able. I will push this to an issue in the private repo, and hopefully we will be able to fix this very soon. For now one would have to re-write the compute_log_prob function a bit such that it does not use the classy object itself but only the parameters and reconstructing a classy object internally to that function (if that's something you can do)