nils-braun / b2luigi

Task scheduling and batch running for basf2 jobs made simple
GNU General Public License v3.0
17 stars 11 forks source link

Continue running other tasks when one gbasf2 download failed #69

Closed meliache closed 3 years ago

meliache commented 3 years ago

The Gbasf2Process marks a luigi task that is associated with a gbasf2 project as successful when all the jobs in that project are DONE. However, what then remains is downloading the outputs of those jobs and there it often happens that gb2_ds_get doesn't manage to download all files. Currently, luigi then validates the download, sees that the downloaded directory is missing files and then instead of moving the downloaded dataset from the partial download directory to the final output directy, it raises a RuntimeError that the download wasn't successful.

But that exception stops the whole luigi process and the processing of all tasks. It would be more conventient to let the processing of all other tasks/projects continue in such a case.

That could be solve by just printing out a warning and doing an early return in the download function so that the partial download is never moved to the output directory.

However, I must check that the process isn't at any point marked as successful just because the get_job_status() method returns successful. However, it is currently coupled to the status on the grid and not to the download status, as I use it to decide when to do the download. Here I'll have to think how to solve that cleverly.