materialsproject / fireworks

The Fireworks Workflow Management Repo.
https://materialsproject.github.io/fireworks
Other
356 stars 185 forks source link

Performance bug with update on large workflows in rlaunch rapidfire #49

Closed dangunter closed 10 years ago

dangunter commented 10 years ago

Running "rlaunch rapidfire" on a single node uncovered a long delay (8-10 sec.) between each task, for a large workflow of 10,000 items (5,000 sequences of 2 items). The delay had 2 sources: (1) a hostname lookup, which wasn't being cached -- this was immediately fixed, and accounted for ~5 sec. of the delay (2) a performance problem with the update after the job was launched.

To reproduce, use the script from this gist https://gist.github.com/dangunter/9939755 as build_wf.py and run

mkdir abcd python build_wf.py --output abcd --type sequence --tasks 10000 lpad add abcd/fw_sequence_10000.yaml rlaunch rapidfire

and note the pause between tasks..

computron commented 10 years ago

I tried this with the latest version of FW (v0.95); there is still a delay between tasks, but I believe it is now smaller than before and in-line with what is reported in submitted paper.

Let me know if you see any anomalous behavior