radical-collaboration / hpc-workflows

NSF16514 EarthCube Project - Award Number:1639694
5 stars 0 forks source link

Longer runtime in large scale simulation #133

Closed Weiming-Hu closed 3 years ago

Weiming-Hu commented 3 years ago

Problem description

I have observed a longer runtime when I execute the exact same workflow but on a larger scale. My program is multi-threaded and time profiling is done for each sub-routine. As far as I'm concerned, this is not due to IO competition because the IO time on the large scale is consistent with that on the small scale. There might be a different cause to this.

Log files

I have prepared my log files on Cheyenne. The EnTK job script is /glade/u/home/wuh20/github/pv-workflow/02_WeightOptimization/02_SearchWeights.py. The sandbox files are located under /glade/u/home/wuh20/scratch/entk_log_backup. To be more specific:

  1. re.session.cheyenne4.wuh20.018662.0015: The small scale run with just a few unit folders
  2. re.session.cheyenne6.wuh20.018662.0019: The large scale run with lots of unit folders

For example, server/re.session.cheyenne4.wuh20.018662.0015/pilot.0000/unit.000005/unit.000005.sh and server/re.session.cheyenne6.wuh20.018662.0019/pilot.0000/unit.016001/unit.016001.sh carry out the same task. You can verify this by looking at the unit job script file.

But, if you look at the end of the corresponding .out file, the time signature is quite different.

For the small scale run:

*************** Simple Clock ***************
Total wall time: 446.13 s
Open a read connection: 0.01 s (0.00%)
Read analogs: 0.02 s (0.00%)
Read forecasts and observations: 0.03 s (0.01%)
Read meta: 0.00 s (0.00%)
Preprocess simulation data: 0.04 s (0.01%)
Calculate sky conditions: 185.26 s (41.53%)
Write sky conditions: 0.02 s (0.00%)
Open a read connection: 0.14 s (0.03%)
Simulate scenario 00000 for analogs: 239.05 s (53.58%)
Write scenario 00000: 0.01 s (0.00%)
Simulate scenario 00000 for fcsts: 10.30 s (2.31%)
Write scenario 00000: 0.00 s (0.00%)
Simulate scenario 00000 for obs: 11.25 s (2.52%)
Write scenario 00000: 0.00 s (0.00%)
*********** End of Simple Clock ************

For the large scale run:

*************** Simple Clock ***************
Total wall time: 778.46 s
Open a read connection: 0.01 s (0.00%)
Read analogs: 0.33 s (0.04%)
Read forecasts and observations: 0.01 s (0.00%)
Read meta: 0.00 s (0.00%)
Preprocess simulation data: 0.00 s (0.00%)
Calculate sky conditions: 376.29 s (48.34%)
Write sky conditions: 0.10 s (0.01%)
Open a read connection: 0.11 s (0.01%)
Simulate scenario 00000 for analogs: 364.35 s (46.80%)
Write scenario 00000: 0.08 s (0.01%)
Simulate scenario 00000 for fcsts: 20.31 s (2.61%)
Write scenario 00000: 0.00 s (0.00%)
Simulate scenario 00000 for obs: 16.87 s (2.17%)
Write scenario 00000: 0.00 s (0.00%)
*********** End of Simple Clock ************

I hope to have your insight into this. Thank you.