Closed mturilli closed 1 year ago
If this behavior changes, please let me know. Ref: https://github.com/SCALE-MS/scale-ms/blob/d745bf6acb593f9a0f7bec6227a2d044ed829cb2/src/scalems/radical/runtime.py#L992
This might explain why the compilation of the notebooks stalls when compiling the documentation. We end up with multiple concurrent instances of RP components. I see the list of processes from at least two notebooks alive while running sphinx.
Can you please provide the radical-stack for this? Thanks!
$ radical-stack
python : /home/mturilli/ve/docs/bin/python3
pythonpath :
version : 3.10.6
virtualenv : /home/mturilli/ve/docs
radical.analytics : 1.20.1
radical.entk : 1.30.0
radical.gtod : 1.20.1
radical.pilot : 1.21.0
radical.saga : 1.21.0
radical.utils : 1.21.0
and on three
$ radical-stack
python : /home/mturilli/ve-notebooks/bin/python3
pythonpath :
version : 3.10.6
virtualenv : /home/mturilli/ve-notebooks
radical.analytics : 1.20.1
radical.entk : 1.30.0
radical.gtod : 1.20.1
radical.pilot : 1.21.0
radical.saga : 1.21.0
radical.utils : 1.21.0
This is the same thing we always have on KeyboardInterrupts
: the pilot description has exit_on_error
set (which is the default) and the pilot fails. Setting that flag to False
removes the exception.
The pilot fails because it is submitted but a cancellation request is issued right after during session termination, before the bootstrapper has a chance to complete. So the pilot does not react in time on the termination request and the session falls back to a hard kill which fails the pilot - that results in a respective state update and the exception is raised because the pilot description asks for it. So, this is all as intended. If it should be like this is a different questions, but right now it is not an implementation question but a policy question.
In pushed an exemplary fix for the configuration notebook in the nb3 branch.
This has been replicated on two linux hosts, including
three
.