reanahub / reana-client

REANA command-line client
http://reana-client.readthedocs.io/
MIT License
10 stars 46 forks source link

batch submission of many workflows #493

Open lukasheinrich opened 3 years ago

lukasheinrich commented 3 years ago

for RECAST scans often we want to submit N (N=100 or so) workflows in one go. While we can submit them using a pure loop in bash, it might be nice to be able to submit/manipulate a group of workflows (submit/download/status)

e.g.

analysis1/reana.yml,analysis1/pars1.json
analysis1/reana.yml,analysis1/pars2.json
analysis1/reana.yml,analysis1/pars3.json
analysis2/reana.yml,analysis2/pars1.json
analysis2/reana.yml,analysis2/pars2.json
analysis2/reana.yml,analysis2/pars3.json
tiborsimko commented 3 years ago

WRT submit, if you try to run 100 of them in parallel, most would go into the incoming queue anyway, waiting for runtime slots to liberate. So out of 100 submitted workflows, there will be 10 running, and 90 queued; just a typical example. So submission could remain being done via a tiny outer shell loop, I guess.

WRT getting status, this is indeed useful, and currently not easily possible without outer shell loop either. We were musing about adding a filtering option to many reana-client commands that would allow to list only some workflows that you are interested in, for example:

$ reana-cllent list --filter name=bsm --include-progress
NAME              RUN_NUMBER   CREATED               STARTED               ENDED                 STATUS      PROGRESS
bsm09             2            2021-04-05T18:12:31   2021-04-05T18:12:32   -                     running         6/12
bsm08             2            2021-04-05T18:12:29   2021-04-05T18:12:30   2021-04-05T18:22:56   finished       12/12
bsm07             2            2021-04-05T18:12:22   2021-04-05T18:12:24   -                     running         8/12
...

would display the progress statuses only for those workflows that are named *bsm*. See #510. (CC @ParthS007)

WRT download, how are you picturing it? Imagine you have BSM workflow run 1, run 2 and run 3. Would you download some file into 1 and 2 subdirectiories in this case? I guess using a tiny outer shell loop may be easiest solution here...

lukasheinrich commented 3 years ago

shell loop works of coursse but could result in many repeated API calls on the server. E.g. Condor has a ssimilar connecpt of queue N instead of looping condor_submit .. the former is much faster than the later.

With a batch submission it could all be wrapped in a single API call.

tiborsimko commented 3 years ago

Yes, one API call vs many API calls could make a difference if you are submitting say hundreds of workflows... What would be typical number?