Closed melrom closed 11 years ago
I will have a look at this today on the saga-python end to make sure that cancel() is implemented properly.
I have tested job.cancel() with the PBS adaptor (on india) extensively and it seems to work just fine. However, I'm not even sure if jobs get canceled through saga or if a termination signal is sent to the agent via Redis. Andre, can you clarify?
both, cancel is called and a stop signal is sent via Redis.
Melissa, now that I have checked the saga-python side, can you please investigate on the BigJob side?
@oleweidner: to test with pbs://localhost
explicitly
Investigated pbs://localhost with saga-python master and devel-prod master - pilots seem to terminate properly. verified via qstat. Closing this ticket - will reopen if behavior becomes noticeable again.
This bug applies to the devel-prod branch. At current, it needs further investigation. It was first noticed by me, and then seconded by Vishal.
The behavior is such that - a pilot starts up and then executes CUs. At the end of the scripts, the following cancel commands are issued:
However, the Pilot still appears to be running in the queue until it hits the MAX walltime. This was first noticed using pbs+gsissh to Kraken, and then noticed again using pbs://localhost on India.
We need to make sure the mechanism for shutting down Pilot Jobs after CUs are completed is working properly on this branch.