Open meliache opened 3 years ago
This is a very interesting functionality for our project. I thought if one can split the Basf2PathTask
which runs the grid into separate luigi tasks, like JobSubmissionTask, JobMonitoringTask and DatasetDownloadingTask, which might help to parallelize the code, if that makes sense
Hi @meliache,
this ticket is the closest to the topic I would like to raise. The gbasf2 project submission algorithm does not submit all projects first and then waits for them to finish, but rather some project submissions happen after start of the project monitoring. This is not optimal and we need to ensure that all gbasf2 have been submitted at the start of the b2luigi process. I am still not sure why this happens, but do you have an idea how to fix the issue?
The gbasf2 submission and dataset download operations take a long time. Even when remote workers work in parallel, scheduling happens by default in serial. (Except when the
parallel_scheduling
config option is set tuetrue
. However, this didn't work for me, if you had success with it please message me.) The long gbasf2 submission and the dataset download seem to block the scheduling until that operation is done. This is something that I can live with, since usually only few gbasf2 projects are required, but it would be cool to do something about it.This gbasf2 dataset download is currently triggered in the
get_job_status
method as a subroutine call when the gbasf2 project is all done. Maybe we can call initiate the download as an async subprocess and only mark the job as really complete when the download is done. At least when thegbasf2_download_dataset
b2luigi option is set.Something similar might be done for the submission.
This is not easy and I don't know if we can do both cases. The subprocess sometimes might require user input, e.g. and ca-certificate or ssh key password, so this should still work. And error handling should also be thought about. As I have not much experience with async subprocesses, I'd be happy about help.
If I'm just too stupid for
parallel_scheduling
and with that properly enabled these blocking operations are no problem, then this can be closed. (Thoughparallel_scheduling
also only works for pickable tasks.)