kbaseattic / assembly

An extensible framework for genome assembly.
MIT License
12 stars 14 forks source link

job list lock bug #264

Closed levinas closed 9 years ago

levinas commented 9 years ago

It's the first time I see this job status:

ERROR:root:Traceback (most recent call last):
  File "/disks/arast/assembly/lib/assembly/consume.py", line 416, in callback
    self.compute(body)
  File "/disks/arast/assembly/lib/assembly/consume.py", line 257, in compute
    self.job_list_lock.release()
ValueError: semaphore or lock released too many times
levinas commented 9 years ago

There may be race conditions in how the job_list is managed. This is a separate run on a different server, but it seems related. It happened when two queued job started running at almost the same time when new workers became available.

ERROR:root:Traceback (most recent call last):
  File "/space/arast/assembly/lib/assembly/consume.py", line 416, in callback
    self.compute(body)
  File "/space/arast/assembly/lib/assembly/consume.py", line 303, in compute
    self.job_list.pop(i)
  File "<string>", line 2, in pop
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 774, in _callmethod
    raise convert_to_error(kind, result)
IndexError: pop index out of range
levinas commented 9 years ago

In the consume.py/compute() function, the lock is acquired before data download. I'm wondering if this is related to the cases I'm seeing where one big job is in "Data transfer" while others are "Queued".

Could this be moved to after the download or right before line 255? https://github.com/kbase/assembly/blob/master/lib/assembly/consume.py#L202

cbun commented 9 years ago

Hmm. Can you reproduce this? When do those error messages pop up, respectively?

On Sat Nov 29 2014 at 10:00:30 PM Fangfang Xia notifications@github.com wrote:

In the consume.py/compute() function, the lock is acquired before data download. I'm wondering if this related to the cases I'm seeing where one big job is in "Data transfer" while others are "Queued".

Could this be moved to after the download or right before line 255? https://github.com/kbase/assembly/blob/master/lib/assembly/consume.py#L202

— Reply to this email directly or view it on GitHub https://github.com/kbase/assembly/issues/264#issuecomment-64974849.

levinas commented 9 years ago

https://github.com/kbase/assembly/commit/f1d562cff6af0bc190214912085b469d9b9cb4c3 https://github.com/kbase/assembly/commit/a6081d544bb37f3e1e0e6d838143869dd9e893f2