hpcflow / hpcflow-new

Mozilla Public License 2.0
0 stars 5 forks source link

Support multi-element execution within Python scripts #673

Open aplowman opened 7 months ago

aplowman commented 7 months ago

I am testing out a subset simulation workflow and having difficulties testing it locally due to the slowness of the jobscript-level element loop. To alleviate this, we should support (for the specific case of actions that simply run Python scripts) running the script for multiple elements within the same invocation.

When an action script is written (with script_data_in: direct and script_data_out: direct), boilerplate code like this is added to the script file that is generated within the run directory (sample_direct_MC.py is the name of the script in this particular case):

def sample_direct_MC(...):
    ...
    return ...

if __name__ == "__main__":
    # ...
    # ... parse CLI arguments, import app, set config, etc.
    # ...
    wk_path, EAR_ID = args.wk_path, args.run_id
    wk = app.Workflow(wk_path)
    EAR = wk.get_EARs_from_IDs([EAR_ID])[0]
    direct_ins = EAR.get_input_values_direct()
    outputs = sample_direct_MC(**direct_ins)
    outputs = {"outputs." + k: v for k, v in outputs.items()}
    for name_i, out_i in outputs.items():
        wk.set_parameter_value(param_id=EAR.data_idx[name_i], value=out_i)

To support this new feature, we would instead write something like:

def sample_direct_MC(...):
    ...
    return ...

    # ...
    # ... parse CLI arguments, import app, set config, etc.
    # ...
    wk_path, EAR_IDs = args.wk_path, args.run_ids # note: plural
    wk = app.Workflow(wk_path)
    runs = wk.get_EARs_from_IDs(EAR_IDs)[0]
    for run_i in runs:
        direct_ins = run_i .get_input_values_direct()
        outputs = sample_direct_MC(**direct_ins)
        outputs = {"outputs." + k: v for k, v in outputs.items()}
        for name_i, out_i in outputs.items():
            wk.set_parameter_value(param_id=run_i.data_idx[name_i], value=out_i)

i.e. we introduce to the boilerplate a loop over runs, and call the script's main function within this loop.

Possible future work after initial implementation:

aplowman commented 7 months ago

The jobscript would also need to be modified to support this!