subjobs (reporting) - Githubissues

yarikoptic commented 4 years ago

Description

Some executors internally might be running a number of subjobs. E.g. for a bids app ran via reproman run, it could be per each subject analysis, and it could point to logs per each subjob. It would be nice so not unlike all the github actions workflows etc a job could have associated subjobs to be able to look at and at least be able to review their logs separate etc.

Then some jobs might need multiple stages. E.g. for a bids app typically it could have two stages -- per subject , and then a group level run, like ATM is done in this prototype https://github.com/ReproNim/reproman/pull/438/files#diff-5b4aa18b79cf44a38ba925fff658fd8cR165

May be it could be then a dedicated "bids-app reproman runner" which would expose two subjobs (per participant as above) and then a group one. Making it possible to be a tree of subjobs reporting would accommodate it. Or may be in such cases it would be just a matter of adding new jobs conditioned on results of some previously added to qme (i.e. I add a job for per participant analysis, and say that the next group one depends on successful completion of another one (sorry for clamping that additional issue here).

yarikoptic commented 4 years ago

Absolutely crazy idea -- even 'shell' (well - more of bash, POSIX compliant shell like posh might not support it fully) runner could report subjobs (e.g. triggered by an option) based on tracing execution using PS4 env var and -x mode. E.g. consider a dummy script

$> cat /tmp/script.sh
#!/bin/bash

echo "doing one thing"

echo "and will be failing with another"
false

echo "but carrying on since no -e mode (an option?)"
true

running of which results in

$> /tmp/script.sh
doing one thing
and will be failing with another
but carrying on since no -e mode (an option?)

which if ran with -x, would show

$> PS4='QME[$?][$(date +%DT%T)]> ' bash --noprofile -x /tmp/script.sh
... here goes lots of tracing of current profile settings etc  - yet to figure out how to overcome in such invocation
QME[0][05/28/20T07:11:29]> echo 'doing one thing'
doing one thing
QME[0][05/28/20T07:11:29]> echo 'and will be failing with another'
and will be failing with another
QME[0][05/28/20T07:11:29]> false
QME[1][05/28/20T07:11:29]> echo 'but carrying on since no -e mode (an option?)'
but carrying on since no -e mode (an option?)
QME[0][05/28/20T07:11:29]> true

so it would be possible for an executor to parse it out into subjobs, report success/failure of any particular subcommand, and provide timing estimates.

edit 1:

note1: there will be no log files per each subjob, but executor could provide in its record the actual output from each subjob
note 2: all the PS4 outputs go to stderr. So in such mode it might be beneficial to just redirect stderr to stdout (2>&1) so output is properly annotated etc. cons: would be impossible to tell stderr apart from stdout.

In research scripts I think -e and -u modes must be generally used in the scripts (and that is what we advise people to use), but many people do not even know about them and their scripts often just keep plowing through the data possibly producing garbage or removing what they must not touch (e.g., upon rm -rf $UNDEFINEDPREFIX/ ;)). That is why, an orthogonal to subjobs reporting, an additional option for such executor could trigger -e mode (and another one? -u), so it would fail upon first unhandled failed command:

$> PS4='QME[$?][$(date +%DT%T)]> ' bash --noprofile -x -e /tmp/script.sh
...
QME[0][05/28/20T07:13:33]> echo 'doing one thing'
doing one thing
QME[0][05/28/20T07:13:33]> echo 'and will be failing with another'
and will be failing with another
QME[0][05/28/20T07:13:33]> false
$> echo $?
1

vsoch commented 4 years ago

Isn't this more of a workflow manager sort of thing? Qme is a dashboard for running tasks, but it's not a workflow manager. Orchestrating jobs sounds like something for a workflow manager. Are you suggesting that qme take some input script and try to break it into pieces? If you are looking for async, then that would be to discuss in #2 (and it feels different because it's not sticking it's hands into a user script, but rather running full sets of commands async). If I'm not fully understanding the user case I'll need an actual toy example of commands to run, and the expected output.

vsoch commented 4 years ago

This is a repeat now of #30

yarikoptic commented 4 years ago

Are you suggesting that qme take some input script and try to break it into pieces?

not really. it is just for an executioner to be able to report on the subjobs. Here it just was a crazy idea to report them for any (simple) shell script by such a custom executioner.

vsoch commented 4 years ago

Ah ok, so would qme be creating subjobs, or reading them from a script that creates them? If the latter, how often do user scripts do this?

Also for the run in background idea- why couldn't the user just add a & after qme run?

vsoch / qme

subjobs (reporting) #24

Description