Open aplowman opened 5 months ago
We could potentially get it down to a single hpcflow invocation, if that command also executes the action commands via subprocess.run
. We would wait for the subprocess to finish and then do post-run
steps. Would need to check that the environment is inherited correctly on all three supported OSes.
(There is a way to avoid creating a sub-process by replacing the current process with os.exec*
, but in that case we would then need another invocation to do post-run
steps, so we wouldn't gain anything.)
For initial reference, I've timed parts of the jobscript for (single-action) array jobs of different sizes below, using the workflow defined at the bottom, which runs a simple Python script (taking a single parameter and outputting a single parameter). "action loop time" is the total time within the action loop (a loop with one iteration in this case, action 0); "element time" is the total run time of the jobscript.
N=10
:
action 0: cut time: N=10 mean=0.0 std=0.00
action 0: get EAR skipped time: N=10 mean=11.8 std=9.92
action 0: write commands time: N=10 mean=4.4 std=4.32
action 0: set EAR start time: N=10 mean=3.7 std=3.41
action 0: commands execution time: N=10 mean=6.0 std=3.44
action 0: set EAR end time: N=10 mean=4.1 std=4.23
action loop time: N=10 mean=30.1 std=21.92
element time: N=10 mean=30.2 std=22.12
N=1000
:
action 0: cut time: N=1000 mean=0.0 std=0.04
action 0: get EAR skipped time: N=1000 mean=2.9 std=4.06
action 0: write commands time: N=1000 mean=2.2 std=1.24
action 0: set EAR start time: N=1000 mean=2.1 std=1.10
action 0: commands execution time: N=1000 mean=3.7 std=3.55
action 0: set EAR end time: N=1000 mean=2.2 std=1.58
action loop time: N=1000 mean=13.1 std=9.52
element time: N=1000 mean=13.1 std=9.55
N=5000
:
action 0: cut time: N=5000 mean=0.0 std=0.06
action 0: get EAR skipped time: N=5000 mean=4.1 std=9.70
action 0: write commands time: N=5000 mean=2.5 std=2.07
action 0: set EAR start time: N=5000 mean=2.2 std=1.79
action 0: commands execution time: N=5000 mean=1.8 std=1.47
action 0: set EAR end time: N=5000 mean=2.3 std=1.60
action loop time: N=5000 mean=13.0 std=14.98
element time: N=5000 mean=13.0 std=15.24
Results:
get-ear-skipped
)Methodology:
icelake
nodes$SECONDS
in bash at various point in the jobscript, introduced temporarily in this branch: https://github.com/hpcflow/hpcflow-new/commits/fix/large-workflow/Test workflow template:
doc: |
A workflow for benchmarking the overhead introduced by hpcflow in running a Python
script `N` times.
template_components:
task_schemas:
- objective: run_script
inputs:
- parameter: p1
outputs:
- parameter: p2
actions:
- environments:
- scope:
type: any
environment: python_env
script: <<script:/absolute/path/to/main_script_test_direct_in_direct_out.py>>
script_exe: python_script
script_data_in: direct
script_data_out: direct
resources:
any:
scheduler_args:
options:
--time: 00:10:00
--partition: <<var:partition[default=icelake]>>
tasks:
- schema: run_script
inputs:
p1: 101
repeats: <<var:N[default=1]>>
Script main_script_test_direct_in_direct_out.py
:
def main_script_test_direct_in_direct_out(p1):
# process
p2 = p1 + 100
# return outputs
return {"p2": p2}
We can speed up execution in a few ways.
js_0_act_0.sh
) and source files (e.g. python scripts) across elementsInvoke hpcflow fewer times
In jobscripts, we currently call four hpcflow commands for each action run:
skip=`wkflow_app internal workflow "$WK_PATH_ARG" get-ear-skipped $EAR_ID 2>> "$app_stream_file"`
wkflow_app internal workflow "$WK_PATH_ARG" write-commands $SUB_IDX $JS_IDX $JS_act_idx $EAR_ID >> "$app_stream_file" 2>&1
wkflow_app internal workflow "$WK_PATH_ARG" set-ear-start $EAR_ID >> "$app_stream_file" 2>&1
wkflow_app internal workflow "$WK_PATH_ARG" set-ear-end $JS_IDX $JS_act_idx $EAR_ID "--" "$exit_code" >> "$app_stream_file" 2>&1
These should be combined into two commands, which will reduce the overhead of starting (and unpacking if using the built executable) hpcflow:
Reuse command and source files
artifacts
directorypre-run
command above.