Closed levinas closed 10 years ago
Are you happened to be logged on with your rast account? If so, I patched the problem up in 42a080b56dd17b8860f9db3bbdb82e752b7e7365
I was not. I guess that’s a separate error.
On Aug 12, 2014, at 8:01 PM, Christopher Bun notifications@github.com wrote:
Are you happened to be logged on with your rast account? If so, I patched the problem up in 42a080b
— Reply to this email directly or view it on GitHub.
If terminating a job in the 'Data transfer' stage is the same as terminating a running job, the fix should be:
r"(Running|Stage)"
=> r"(Running|Stage|Data)"
26da2835e310a88d500959cc13ecbd450585d5be
Assigned to me. Will test it when you merge with dev.
It seems if the kill command is run when the job is in 'Data transfer' stage, the command just gets ignored. You can try with a not too small dataset:
$ ar-run -a spades --data 1; ar-kill -j 11
Job ID: 12
Kill request sent for job 12
$ ar-stat
| 12 | 1 | Stage 1/3: spades | 0:01:57 | None |
Solved via a lock on the compute's job list.
07153df63975e081bab8b935277e14f2af1cf17b
Great. Could you redeploy the dev server? Should this solve this one test failure I'm seeing every other night..
ok 32 - main::test_simple_cases > ar-run -a spades --data $(cat data.2|sed "s/[^0-9]*//g") -m ...
Job 452: Removed From Queue
ok 33 - main::test_simple_cases > ar-kill -j $(cat job.7|sed "s/[^0-9]*//g”)
...
ok 70 - main::test_simple_cases > ar-stat -j $(cat job.7|sed "s/[^0-9]*//g") > stat.term.7
not ok 71 - job properly terminated
# Failed test 'job properly terminated'
# at ./arast.t line 123.
# 'Complete
# '
# doesn't match '(?-xism:Terminated)'
I have a case where
ar-kill
fails on a job just submitted with data id:https://github.com/kbase/assembly/blob/master/test/arast.t#L102
Output:
I suspect the reason is that the
send_kill_message
function in router missed the case when the job status is 'Data transfer': https://github.com/kbase/assembly/blob/master/lib/assembly/router.py#L53