reanahub / reana-workflow-engine-yadage

REANA Workflow Engine Yadage
http://reana-workflow-engine-yadage.readthedocs.io/
MIT License
0 stars 34 forks source link

Investigate slowness while running BSM-Search example #140

Open roksys opened 4 years ago

roksys commented 4 years ago

When trying to scale BSM-Search example, it could take up to 3 mins for the workflow engine to start submitting first jobs.

In order to increase number of jobs the following modification is needed.

$ git diff
diff --git a/workflow/databkgmc.yml b/workflow/databkgmc.yml
index b41427e..e41ae7e 100644
--- a/workflow/databkgmc.yml
+++ b/workflow/databkgmc.yml
@@ -16,7 +16,7 @@ stages:
       parameters:
         mcname: [mc1,mc2]
         mcweight: [0.01875,0.0125]  # [Ndata / Ngen * 0.2 * 0.15,  Ndata / Ngen * 0.2 * 0.1] = [10/16*0.03, 1/16 * 0.02]
-        nevents:  [40000,40000,40000,40000]  #160k events / mc sample
+        nevents:   [5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,50005000,5000,5000,5000,50005000,5000,5000,5000,5000,5000,5000]
       workflow: {$ref: workflow/wflow_all_mc.yml}
   - name: data
     scheduler:
lukasheinrich commented 4 years ago

yadage has two modes:

the latter is the default and it helps with debugging and doing "dry-runs", but the behavior can be controlled using engine options. We could expose those to reana.yml

roksys commented 4 years ago

Hi @lukasheinrich,

How could I set the first option? Is it one of yadage-run options?

$ yadage-run --help
Usage: yadage-run [OPTIONS] DATAARG [WORKFLOW] [INITFILES]...

Options:
  -b, --backend TEXT              packtivity backend string
  -c, --cache TEXT
  -d, --dataopt TEXT              options for the workflow data state
  -e, --schemadir TEXT            schema directory for workflow validation
  -f, --from-file FILENAME        read entire configuration from file, no
                                  other flags settings are read.
  -g, --strategy TEXT             set execution stragegy
  -i, --loginterval INTEGER       adage tracking interval in seconds
  -k, --backendopt TEXT           options for the workflow data state
  -l, --modelopt TEXT             options for the workflow state models
  -m, --metadir TEXT              directory to store workflow metadata
  -o, --ctrlopt TEXT              options for the workflow controller
  -p, --parameter TEXT            <parameter name>=<yaml string> input
                                  parameter specifcations
  -r, --controller TEXT           controller
  -s, --modelsetup TEXT           wflow state model
  -t, --toplevel TEXT             toplevel uri to be used to resolve workflow
                                  name and references from
  -u, --updateinterval FLOAT      adage graph inspection interval in seconds
  -v, --verbosity TEXT            logging verbosity
  --accept-metadir / --no-accept-metadir
  --plugins TEXT
  --validate / --no-validate      en-/disable workflow spec validation
  --visualize / --no-visualize    visualize workflow graph
  --help                          Show this message and exit.