limix / bgen-reader-py

A BGEN file format reader.
MIT License
10 stars 3 forks source link

pickle error on dask cluster #21

Closed jeremymcrae closed 4 years ago

jeremymcrae commented 4 years ago

Hi,

I'm getting an error when using bgen_reader on a distributed dask cluster. I can load the bgen file in, but I get an error when I try to access the genotypes after that. This works without any problems if I don't start a cluster though. Is this something that could be fixed, or could it be specific to my cluster setup?

>>> from bgen_reader import read_bgen
>>> cluster = get_cluster()  # code omitted, but starts cluster on Sun Grid Engine
>>> cluster
SGECluster('tcp://10.112.80.16:35423', workers=5, threads=10, memory=200.00 GB)
>>> bgen = read_bgen(PATH_TO_BGEN, samples_filepath=PATH_TO_SAMPLES)
>>> bgen['genotype'][0].compute()
Traceback (most recent call last):
  File "/home/jmcrae/.local/miniconda/lib/python3.7/site-packages/distributed/worker.py", line 3244, in dumps_function
    result = cache_dumps[func]
  File "/home/jmcrae/.local/miniconda/lib/python3.7/site-packages/distributed/utils.py", line 1439, in __getitem__
    value = super().__getitem__(key)
  File "/home/jmcrae/.local/miniconda/lib/python3.7/collections/__init__.py", line 1027, in __getitem__
    raise KeyError(key)
KeyError: <function _get_read_genotype.<locals>._read_genotype at 0x2aab08752b00>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jmcrae/.local/miniconda/lib/python3.7/site-packages/distributed/protocol/pickle.py", line 38, in dumps
    result = pickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
AttributeError: Can't pickle local object '_get_read_genotype.<locals>._read_genotype'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jmcrae/.local/miniconda/lib/python3.7/site-packages/dask/base.py", line 165, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/jmcrae/.local/miniconda/lib/python3.7/site-packages/dask/base.py", line 436, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/jmcrae/.local/miniconda/lib/python3.7/site-packages/distributed/client.py", line 2565, in get
    actors=actors,
  File "/home/jmcrae/.local/miniconda/lib/python3.7/site-packages/distributed/client.py", line 2492, in _graph_to_futures
    "tasks": valmap(dumps_task, dsk3),
  File "/home/jmcrae/.local/miniconda/lib/python3.7/site-packages/toolz/dicttoolz.py", line 83, in valmap
    rv.update(zip(iterkeys(d), map(func, itervalues(d))))
  File "/home/jmcrae/.local/miniconda/lib/python3.7/site-packages/distributed/worker.py", line 3282, in dumps_task
    return {"function": dumps_function(task[0]), "args": warn_dumps(task[1:])}
  File "/home/jmcrae/.local/miniconda/lib/python3.7/site-packages/distributed/worker.py", line 3246, in dumps_function
    result = pickle.dumps(func)
  File "/home/jmcrae/.local/miniconda/lib/python3.7/site-packages/distributed/protocol/pickle.py", line 51, in dumps
    return cloudpickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
  File "/home/jmcrae/.local/miniconda/lib/python3.7/site-packages/cloudpickle/cloudpickle.py", line 1125, in dumps
    cp.dump(obj)
  File "/home/jmcrae/.local/miniconda/lib/python3.7/site-packages/cloudpickle/cloudpickle.py", line 482, in dump
    return Pickler.dump(self, obj)
  File "/home/jmcrae/.local/miniconda/lib/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File "/home/jmcrae/.local/miniconda/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/jmcrae/.local/miniconda/lib/python3.7/site-packages/cloudpickle/cloudpickle.py", line 556, in save_function
    return self.save_function_tuple(obj)
  File "/home/jmcrae/.local/miniconda/lib/python3.7/site-packages/cloudpickle/cloudpickle.py", line 758, in save_function_tuple
    save(state)
  File "/home/jmcrae/.local/miniconda/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/jmcrae/.local/miniconda/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/jmcrae/.local/miniconda/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/jmcrae/.local/miniconda/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/jmcrae/.local/miniconda/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/jmcrae/.local/miniconda/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/jmcrae/.local/miniconda/lib/python3.7/pickle.py", line 531, in save
    (t.__name__, obj))
_pickle.PicklingError: Can't pickle 'CompiledLib' object: <Lib object for 'bgen_reader._ffi'>
horta commented 4 years ago

Hi @jeremymcrae , thanks for using this package. It looks like dask cluster is trying to serialize bgen-reader using Python pickle module. It then complains that I cannot serialized a compiled object bgen_reader._ffi, which is part of our package.

I've never used a dask cluster but here https://distributed.dask.org/en/latest/serialization.html it says that you can specify the type of serialization you want to perform. Have you tried any other?