Closed mturilli closed 9 years ago
Its likely that this line is responsible:
2014:11:30 15:32:35 radical.pilot.MainProcess: [WARNING ] could not reconnect to pilot for state check (global name 'curr_info' is not defined)
That pointed to an error on the saga level job monitoring thread -- its now fixed in the SAGA devel branch.
Scenario: 2-stages workflow of the AIMES demo. The single CU of the second stage fails. From the output - weak indicator - it appears that the pilot where the CU is running is canceled before the CU has the opportunity to terminate.
Session: 547b37db20a6417f39f30ffa
From the pilot-547b37e620a6417f39f30ffe AGENT.STDERR :
From the unit-547b387520a6417f39f31016 STDERR:
From debug output:
Full log available on request.