Closed levinas closed 9 years ago
There may be race conditions in how the job_list is managed. This is a separate run on a different server, but it seems related. It happened when two queued job started running at almost the same time when new workers became available.
ERROR:root:Traceback (most recent call last):
File "/space/arast/assembly/lib/assembly/consume.py", line 416, in callback
self.compute(body)
File "/space/arast/assembly/lib/assembly/consume.py", line 303, in compute
self.job_list.pop(i)
File "<string>", line 2, in pop
File "/usr/lib/python2.7/multiprocessing/managers.py", line 774, in _callmethod
raise convert_to_error(kind, result)
IndexError: pop index out of range
In the consume.py/compute() function, the lock is acquired before data download. I'm wondering if this is related to the cases I'm seeing where one big job is in "Data transfer" while others are "Queued".
Could this be moved to after the download or right before line 255? https://github.com/kbase/assembly/blob/master/lib/assembly/consume.py#L202
Hmm. Can you reproduce this? When do those error messages pop up, respectively?
On Sat Nov 29 2014 at 10:00:30 PM Fangfang Xia notifications@github.com wrote:
In the consume.py/compute() function, the lock is acquired before data download. I'm wondering if this related to the cases I'm seeing where one big job is in "Data transfer" while others are "Queued".
Could this be moved to after the download or right before line 255? https://github.com/kbase/assembly/blob/master/lib/assembly/consume.py#L202
— Reply to this email directly or view it on GitHub https://github.com/kbase/assembly/issues/264#issuecomment-64974849.
It's the first time I see this job status: