malariagen / ag1000g-phase3-data-paper

Other
1 stars 2 forks source link

Q89 male hets #17

Closed hardingnj closed 4 years ago

hardingnj commented 4 years ago

I'm getting an unexpected Killed Worker error. Is this what you experienced @jonbrenas ?

I'm stumped!

review-notebook-app[bot] commented 4 years ago

Check out this pull request on  ReviewNB

Review Jupyter notebook visual diffs & provide feedback on notebooks.


Powered by ReviewNB

hardingnj commented 4 years ago

Hmm- some progress. It only seems to happen when I use allel.GenotypeDaskArray. Otherwise runs as expected.

In the logs I find:

distributed.worker - ERROR - maximum recursion depth exceeded Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/distributed/worker.py", line 905, in handle_scheduler comm, every_cycle=[self.ensure_communicating, self.ensure_computing] File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 456, in handle_stream msgs = await comm.read() File "/opt/conda/lib/python3.7/site-packages/distributed/comm/tcp.py", line 222, in read frames, deserialize=self.deserialize, deserializers=deserializers File "/opt/conda/lib/python3.7/site-packages/distributed/comm/utils.py", line 69, in from_frames res = _from_frames() File "/opt/conda/lib/python3.7/site-packages/distributed/comm/utils.py", line 55, in _from_frames frames, deserialize=deserialize, deserializers=deserializers File "/opt/conda/lib/python3.7/site-packages/distributed/protocol/core.py", line 124, in loads value = _deserialize(head, fs, deserializers=deserializers) File "/opt/conda/lib/python3.7/site-
...

However- if I return the dask object from the function load_calldata_by_sampleset then pop that into allel.GenotypeDaskArray it works as expected.

Which I think is something to do with the init() not being handled properly in distributed. Though I suspect I need @alimanfoo to dianose fully. My understanding of scope etc in python is limited.

For the moment though deleting two lines in phase3_data.py fixes my issue. Unsure if this is the same mysterious cause that @jonbrenas and @alimanfoo observed.

alimanfoo commented 4 years ago

Using dask distributed does mean that some pickling of objects and functions happens to move them to workers, so it could be there is a problem with that. Would it be possible to make a small reproducible example notebook?

hardingnj commented 4 years ago

I'll do my best!

hardingnj commented 4 years ago

Have been trying- but no luck unfortunately... just seems to work as expected. I think I'll just implement the work around and press on.

hardingnj commented 4 years ago

superseded by #18

leehart commented 4 years ago

Deleting branch.