Closed puorc closed 4 years ago
@puorc you might want to check the develop branch on C-PAC, now the model is not passed anymore as a parameter, but just the path to the model. So the model per se is created during execution, within the predict_volumes
function.
Hi Anibal, per the unet case, I assumed it might be the PyTorch issue. But looks like the same problem happened in ANTs registration.
https://github.com/radiome-lab/registration/blob/master/tests/test_ants.py (set linear to false)
Info:
2020-04-02 09:58:48,282-15s radiome.execution.executor INFO: Joining execution of 3 executions: ['execute_subgraph-7ef3a051-78ca-4bfe-ae16-c004f19f8e88', 'execute_subgraph-64ae3c36-bb7b-4feb-9b71-e6064dfa5edb', 'execute_subgraph-ce227b59-a5e3-4ec4-8385-7a01ffbee450']
Exception ignored in: <function State.__del__ at 0x113cdcc10>
Traceback (most recent call last):
File "/Users/pu/workspace/radiome/radiome/core/execution/__init__.py", line 52, in __del__
if not isinstance(self._resource, Job):
AttributeError: 'State' object has no attribute '_resource'
distributed.worker - WARNING - Could not deserialize task
Traceback (most recent call last):
File "/Users/pu/workspace/registration/venv/lib/python3.8/site-packages/distributed/worker.py", line 2410, in _maybe_deserialize_task
function, args, kwargs = _deserialize(*self.tasks[key])
File "/Users/pu/workspace/registration/venv/lib/python3.8/site-packages/distributed/worker.py", line 3247, in _deserialize
kwargs = pickle.loads(kwargs)
File "/Users/pu/workspace/registration/venv/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 59, in loads
return pickle.loads(x)
ModuleNotFoundError: No module named 'radiome_workflow_ants'
Exception ignored in: <function State.__del__ at 0x11e20bf70>
Traceback (most recent call last):
File "/Users/pu/workspace/radiome/radiome/core/execution/__init__.py", line 52, in __del__
if not isinstance(self._resource, Job):
AttributeError: 'State' object has no attribute '_resource'
distributed.worker - WARNING - Could not deserialize task
Traceback (most recent call last):
File "/Users/pu/workspace/registration/venv/lib/python3.8/site-packages/distributed/worker.py", line 2410, in _maybe_deserialize_task
function, args, kwargs = _deserialize(*self.tasks[key])
File "/Users/pu/workspace/registration/venv/lib/python3.8/site-packages/distributed/worker.py", line 3247, in _deserialize
kwargs = pickle.loads(kwargs)
File "/Users/pu/workspace/registration/venv/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 59, in loads
return pickle.loads(x)
ModuleNotFoundError: No module named 'radiome_workflow_ants'
Exception ignored in: <function State.__del__ at 0x112259f70>
Traceback (most recent call last):
File "/Users/pu/workspace/radiome/radiome/core/execution/__init__.py", line 52, in __del__
if not isinstance(self._resource, Job):
AttributeError: 'State' object has no attribute '_resource'
distributed.worker - WARNING - Could not deserialize task
Traceback (most recent call last):
File "/Users/pu/workspace/registration/venv/lib/python3.8/site-packages/distributed/worker.py", line 2410, in _maybe_deserialize_task
function, args, kwargs = _deserialize(*self.tasks[key])
File "/Users/pu/workspace/registration/venv/lib/python3.8/site-packages/distributed/worker.py", line 3247, in _deserialize
kwargs = pickle.loads(kwargs)
File "/Users/pu/workspace/registration/venv/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 59, in loads
return pickle.loads(x)
ModuleNotFoundError: No module named 'radiome_workflow_ants'
2020-04-02 09:58:49,257-15s radiome.execution.state INFO: Gathering resources
2020-04-02 09:58:49,258-15s radiome.execution.state INFO: Wiping out PythonJob(d7f6c670,calc_ants_warp) directory.
2020-04-02 09:58:49,259-15s radiome.execution.state INFO: Wiping out Computed(PythonJob(d7f6c670,calc_ants_warp),warped_image) directory.
2020-04-02 09:58:49,260-15s radiome.execution.state INFO: Wiping out PythonJob(d7f6c670,calc_ants_warp) directory.
2020-04-02 09:58:49,260-15s radiome.execution.state INFO: Wiping out Computed(PythonJob(d7f6c670,calc_ants_warp),warped_image) directory.
Hi @puorc
It actually works for me, I pip installed the most recent versions of both repos and changed linear = False (I could see it spawned the Nanny from Dask, so it is working). You might want to check your installation since the error is definitely an environment problem (No module named 'radiome_workflow_ants'
)
Setting up a Travis test config would be helpful for these cases.
Nevertheless, it needs a better handling for sure. I cannot reproduce it tho, will try to blindly handle it :D
It keeps happening on my computer even after I set up a new virtualenv and install these packages again. I found the problem is that importing modules using their file path at runtime might not sync to the worker processes correctly through deserialization. So I make some change to the import workflow functions according to some answers from stackoverflow and it looks good now.
Try to execute Unet in DaskExecutor but failed, while it succeed and produced expected results in Executor.
Code https://github.com/radiome-lab/preprocessing/blob/master/tests/test_skullstrip.py https://github.com/radiome-lab/preprocessing/tree/master/radiome/workflows/preprocessing/skullstrip/unet
Info:
Possible causes and analysis
__del__
.The error should happen in this block:
Tested it in
Executor
and looks good. Guess there may be errors in serialization/deserialization process.