All scene-graph objects should be pickleable

jacklovell commented 6 years ago

My use case: I have a TargettedPixel observer which samples a grid cell with 1000 rays. I repeat this for every cell in my reconstruction grid to build up a volumetric sensitivity matrix for tomographic inversions. I am currently attempting to do this for a large grid (751263 cells). The nature of the problem means that each individual observation is pretty quick, but there are a lot of observations.

I'm currently using the MulticoreEngine render engine for this, which sets up and tears down worker processes for each observation. Because the observations are so quick, this is leading to significant overheads, as is seen from one of the jobs I submitted, which was using 16 worker processes for each observation:

 User Time        = 288:02:50
 System Time      = 272:04:52
 Wallclock Time   = 155:20:04

(this job terminated prematurely for an unrelated reason).

I'd like to change the way the problem is chunked:

Use the SerialEngine for each observation to avoid creating and destroying processes for each job
Use concurrent.futures.ProcessPoolExecutor to parallelise the loop over the grid cells.

The problem I run into now is that the mesh describing the vessel surfaces, which needs to be available to each process in the pool, can't be pickled:

Traceback (most recent call last):
  File "/usr/local/depot/Python-3.5.1/lib/python3.5/multiprocessing/queues.py", line 241, in _feed
    obj = ForkingPickler.dumps(obj)
  File "/usr/local/depot/Python-3.5.1/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
    cls(buf, protocol).dump(obj)
  File "stringsource", line 2, in raysect.primitive.mesh.mesh.MeshData.__reduce_cython__
TypeError: self._nodes cannot be converted to a Python object for pickling

I've tested this with a simple example, which also fails:

from raysect.primitive import Mesh
from raysect.optical import World
from raysect.optical.material import AbsorbingSurface
world = World()
Mesh.from_file('/projects/cadmesh/mast/mastu-full/vacuum_vessel/BOTTOM_ENDPLATE.rsm', parent=world, material=AbsorbingSurface(), name='bottom endplate')
pickle.dumps(world.children[0])

TypeError                                 Traceback (most recent call last)
<ipython-input-10-248559f83ee0> in <module>()
----> 1 pickle.dumps(world.children[0])

/home/jlovell/cherab/raysect/raysect/primitive/mesh/mesh.cpython-35m-x86_64-linux-gnu.so in raysect.primitive.mesh.mesh.MeshData.__reduce_cython__()

TypeError: self._nodes cannot be converted to a Python object for pickling

Is it feasible to implement __getstate__ and __setstate__ for the MeshData class to enable the mesh to be pickled?

mattngc commented 6 years ago

Its going to be difficult to support this type of multiprocessing without allowing the whole scene-graph to be pickled. Whilst pickle works automatically on normal python objects, its more limited to simple cython objects. Because cython is used extensively throughout Raysect core, we will need to manually add __getstate__() and __setstate__() methods to all the scene-graph objects. This should be a longer term goal anyway. Let's add this to our next release plan.

mattngc commented 6 years ago

Issue #207 is related to this issue but will be absorbed by this issues, since its broader.

raysect / source

All scene-graph objects should be pickleable #226