irods / irods_capability_automated_ingest

Other
12 stars 15 forks source link

post_job handler is called multiple times #201

Closed rmoreas closed 1 year ago

rmoreas commented 1 year ago

I can't get my head around the processing of tasks when using concurrent worker threads/processes (worker concurrency > 1).

I would think that each task is processed once by one worker process/thread, but it seems that individual tasks are executed multiple times by different workers. See for example the log snippets below: the sync_dir task is added multiple times by different workers for the same job.

[2023-06-16 19:25:10,015: WARNING/ForkPoolWorker-4] {"task": "sync_dir", "path": "/data/BIAX/RUNS/test01", "job_name": "biax", "event": "incr_job_name", "logger": "irods_sync//DEBUG", "level": "info", "@timestamp": "2023-06-16T19:25:10.014871+00:00"}
...
[2023-06-16 19:25:10,028: WARNING/ForkPoolWorker-3] {"task": "sync_dir", "path": "/data/BIAX/RUNS/test01", "job_name": "biax", "event": "incr_job_name", "logger": "irods_sync//DEBUG", "level": "info", "@timestamp": "2023-06-16T19:25:10.028528+00:00"}
...

Also the post_job handler seems to be called multiple times by different workers for the same job, as can been seen in log snippets below; the log message "Checking marker files: ..." is created in the post_job handler.

[2023-06-16 19:25:10,097: WARNING/ForkPoolWorker-2] {"event": "Checking marker files: [PosixPath('/data/BIAX/RUNS/test01/test.xls')]", "logger": "irods_sync//DEBUG", "level": "info", "@timestamp": "2023-06-16T19:25:10.097439+00:00"}
[2023-06-16 19:25:10,101: WARNING/ForkPoolWorker-4] {"event": "Checking marker files: [PosixPath('/data/BIAX/RUNS/test01/test.xls')]", "logger": "irods_sync//DEBUG", "level": "info", "@timestamp": "2023-06-16T19:25:10.101035+00:00"}
...
[2023-06-16 19:25:10,126: WARNING/ForkPoolWorker-1] {"event": "Checking marker files: [PosixPath('/data/BIAX/RUNS/test01/test.xls')]", "logger": "irods_sync//DEBUG", "level": "info", "@timestamp": "2023-06-16T19:25:10.126337+00:00"}
...
[2023-06-16 19:25:10,134: WARNING/ForkPoolWorker-3] {"event": "Checking marker files: [PosixPath('/data/BIAX/RUNS/test01/test.xls')]", "logger": "irods_sync//DEBUG", "level": "info", "@timestamp": "2023-06-16T19:25:10.134783+00:00"}

What am I missing here? Is this working as designed?

/cc @alanking @trel

rmoreas commented 1 year ago

I did some more investigation by using Flower to get better insight of what is going on.

It seems that the sync_path, sync_dir and sync_files tasks are correctly executed by one and only one worker, but the post_job handler is called multiple times per job.

trel commented 1 year ago

So this is a real bug? Okay. We'll definitely poke at it to confirm and fix.

rmoreas commented 1 year ago

I doubt that this line is correct:

done = job.tasks_handle().decr() and not job.periodic()

I my understanding the job is done if the tasks count drops to zero: that is if job.tasks_handle().decr() == 0. Below zero should never happen.

rmoreas commented 1 year ago

It seems that commit 45e04b46 introduced this bug, The original code before this commit was:

done = retry(logger, decr_with_key, r, tasks_key, job_name) == 0 and not retry(logger, periodic, r, job_name)

So before the condition was indeed == 0.

rmoreas commented 1 year ago

Here a PR #202

alanking commented 1 year ago

Thanks for the contribution, @rmoreas!

I'm going to leave this open for now so that I can add a test.

alanking commented 1 year ago

I have written a test which demonstrates the fix for post_job. PR to come Soon

trel commented 1 year ago

nice.