ratt-ru / QuartiCal

CubiCal, but with greater power.
MIT License
8 stars 4 forks source link

Creating the net gain breaks down when using multiple Dask workers #99

Closed JSKenyon closed 3 years ago

JSKenyon commented 3 years ago

As it says in the title, output.net_gain=True causes problems (specifically with respect to pickling) when using multiple Dask workers. This shouldn't be difficult to fix, this issue is just to serve as a warning/reminder.

JSKenyon commented 3 years ago

I have found a solution but will document this here. This is the traceback:

Traceback (most recent call last):
  File "/home/jonathan/venvs/qcenv/lib/python3.8/site-packages/distributed/batched.py", line 93, in _background_send
    nbytes = yield self.comm.write(
  File "/home/jonathan/venvs/qcenv/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
    value = future.result()
  File "/home/jonathan/venvs/qcenv/lib/python3.8/site-packages/distributed/comm/tcp.py", line 243, in write
    frames = await to_frames(
  File "/home/jonathan/venvs/qcenv/lib/python3.8/site-packages/distributed/comm/utils.py", line 50, in to_frames
    return _to_frames()
  File "/home/jonathan/venvs/qcenv/lib/python3.8/site-packages/distributed/comm/utils.py", line 33, in _to_frames
    return list(protocol.dumps(msg, **kwargs))
  File "/home/jonathan/venvs/qcenv/lib/python3.8/site-packages/distributed/protocol/core.py", line 76, in dumps
    frames[0] = msgpack.dumps(msg, default=_encode_default, use_bin_type=True)
  File "/home/jonathan/venvs/qcenv/lib/python3.8/site-packages/msgpack/__init__.py", line 35, in packb
    return Packer(**kwargs).pack(o)
  File "msgpack/_packer.pyx", line 292, in msgpack._cmsgpack.Packer.pack
  File "msgpack/_packer.pyx", line 298, in msgpack._cmsgpack.Packer.pack
  File "msgpack/_packer.pyx", line 295, in msgpack._cmsgpack.Packer.pack
  File "msgpack/_packer.pyx", line 264, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 264, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 285, in msgpack._cmsgpack.Packer._pack
  File "/home/jonathan/venvs/qcenv/lib/python3.8/site-packages/distributed/protocol/core.py", line 57, in _encode_default
    sub_header, sub_frames = serialize_and_split(
  File "/home/jonathan/venvs/qcenv/lib/python3.8/site-packages/distributed/protocol/serialize.py", line 425, in serialize_and_split
    header, frames = serialize(x, serializers, on_error, context)
  File "/home/jonathan/venvs/qcenv/lib/python3.8/site-packages/distributed/protocol/serialize.py", line 251, in serialize
    return serialize(
  File "/home/jonathan/venvs/qcenv/lib/python3.8/site-packages/distributed/protocol/serialize.py", line 297, in serialize
    headers_frames = [
  File "/home/jonathan/venvs/qcenv/lib/python3.8/site-packages/distributed/protocol/serialize.py", line 298, in <listcomp>
    serialize(
  File "/home/jonathan/venvs/qcenv/lib/python3.8/site-packages/distributed/protocol/serialize.py", line 349, in serialize
    raise TypeError(msg, str(x)[:10000])
TypeError: ('Could not serialize object of type tuple.', '(subgraph_callable-464cf43f-bd73-4c4f-b3c2-636d62260b93, (<function concatenate_axes at 0x7f92de232310>, [["(\'stack-1b1f98c2f3e26614aeffec434f095767\', 0, 0, 0)"]], [0, 2]), (<function concatenate_axes at 0x7f92de232310>, [["(\'stack-eeda88356e5cd160e293990558bb7e20\', 0, 0, 0)"]], [0, 2]), array([[0]], dtype=int32), (<class \'tuple\'>, [106, 64, 28, 1, 4]), 4, "(\'G-gain-c261af76dbb760d439f93f9840240dca\', 0, 0, 0, 0, 0)")')

The actual error (I went and found it) is:

*** _pickle.PicklingError: Could not pickle object as excessively deep recursion required.

This stems from the following blockwise call:

https://github.com/JSKenyon/QuartiCal/blob/416370b2e6f83b10a2831d8516797bb52c592e7f/quartical/gains/datasets.py#L266-L279

It seems that combine_gains (which is a Numba function created with generated_jit) somehow confuses the pickling. It seems that the problem goes away by wrapping the Numba function with a Python function.

sjperkins commented 3 years ago

combine_gains can return a lambda which is not pickleable.

sjperkins commented 3 years ago

combine_gains can return a lambda which is not pickleable.

Wrong branch. However, removing the call to coerce_literals makes combine_gains pickleable.

JSKenyon commented 3 years ago

Closing - I am wrapping the function for now. If that begins failing, I can return to the old behaviour which didn't use coerce_literals.