rescalante-lilly / ruffus

Automatically exported from code.google.com/p/ruffus
MIT License
0 stars 0 forks source link

Ruffus randomly hanging on last jobs of a task #55

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I find it hard to reproduce the problem at will, but sooner or later during the 
running of my pipeline in multiprocessing mode, Ruffus hangs at the end of one 
of the tasks (typically the last given one, but it has happened with earlier 
ones too). This is both on the official 2.2 version and the 2.3b available on 
this repository.

This appears to be because one of the last jobs fails to be run by any of the 
workers (I modified the count_remaining_jobs variable to also keep track of the 
job names, and at the point it hangs, at least one of the jobs is still on that 
list and doesn't seem to have been executed) and so the whole thing gets hung 
on a "waiting_for_more_tasks_to_complete" situation.

There is nothing special about the job and no exceptions are being raised. 
Killing and restarting the pipeline gets past the stuck point with no problem, 
it seems to need a decent queue to work on first before the hangup can occur, 
which makes diagnosing slow and tedious.

I've been trying to debug this myself for days, but I can't seem to find how to 
fix the hanging issue. Please help.

Original issue reported on code.google.com by dolo...@gmail.com on 26 Jun 2013 at 5:27

GoogleCodeExporter commented 9 years ago
It is possible that the process pool on your setup needs special cleaning up, 
or there is some sort of race condition when that is happening.

I notice similar wierdness running in "screen" sessions (in linux).

Could you possible try adding the following lines to task.py:2959?

Where it currently says:

    syncmanager.shutdown()

Change this by adding three more lines to:

    syncmanager.shutdown()
    if pool:
        pool.close()
        pool.join()

Original comment by bunbu...@gmail.com on 29 Aug 2013 at 5:36

GoogleCodeExporter commented 9 years ago

Original comment by bunbu...@gmail.com on 23 Nov 2013 at 1:54