nikhilgupta10 / GridLAB-D

Other
1 stars 0 forks source link

#573 Profile results don't look right for multithreaded profiling, #2318

Open nikhilgupta10 opened 7 years ago

nikhilgupta10 commented 7 years ago

It looks like the profiler results for multithreading core runtimes are not correct.

Output for --profile -T 1 test_R3-12.47-3_NR.glm:

Core profiler results
======================

Total objects              12646 objects
Parallelism                    1 thread
Total time                  90.0 seconds
  Core time                  6.0 seconds (6.7%)
    Compiler                 7.6 seconds (8.5%)
    Instances                0.0 seconds (0.0%)
    Random variables         0.0 seconds (0.0%)
    Schedules                0.0 seconds (0.0%)
    Loadshapes               0.0 seconds (0.0%)
    Enduses                  0.0 seconds (0.0%)
    Transforms               0.0 seconds (0.0%)
  Model time                84.0 seconds/thread (93.3%)
Simulation time                0 days
Simulation speed             141 object.hours/second
Syncs completed              250 passes
Time steps completed          62 timesteps
Convergence efficiency      4.03 passes/timestep
Read lock contention        0.0%
Write lock contention       0.0%
Average timestep             58 seconds/timestep
Simulation rate              40 x realtime

Model profiler results
======================

Class            Time (s) Time (%) msec/obj
---------------- -------- -------- --------
node              65.770     78.3%     32.9
triplex_node       4.400      5.2%      1.4
overhead_line      3.914      4.7%      2.6
transformer        2.983      3.6%      1.7
triplex_meter      2.840      3.4%      1.7
triplex_line       2.605      3.1%      1.6
underground_line   0.549      0.7%      1.7
fuse               0.406      0.5%      2.9
meter              0.174      0.2%      1.6
load               0.170      0.2%      1.6
switch             0.111      0.1%      5.3
recorder           0.032      0.0%     32.0
================ ======== ======== ========
Total             83.954    100.0%      6.6

Output for --profile -T 2 test_R3-12.47-3_NR.glm:

Core profiler results
======================

Total objects              12646 objects
Parallelism                    2 threads
Total time                  73.0 seconds
  Core time                 36.0 seconds (49.3%)
    Compiler                 8.1 seconds (11.1%)
    Instances                0.0 seconds (0.0%)
    Random variables         0.0 seconds (0.0%)
    Schedules                0.0 seconds (0.0%)
    Loadshapes               0.0 seconds (0.0%)
    Enduses                  0.0 seconds (0.0%)
    Transforms               0.0 seconds (0.0%)
  Model time                37.0 seconds/thread (50.7%)
Simulation time                0 days
Simulation speed             173 object.hours/second
Syncs completed              250 passes
Time steps completed          62 timesteps
Convergence efficiency      4.03 passes/timestep
Read lock contention        0.0%
Write lock contention       2.5%
Average timestep             58 seconds/timestep
Simulation rate              49 x realtime

Model profiler results
======================

Class            Time (s) Time (%) msec/obj
---------------- -------- -------- --------
node              60.499     81.7%     30.2
overhead_line      3.410      4.6%      2.3
triplex_node       3.338      4.5%      1.0
transformer        2.006      2.7%      1.2
triplex_meter      1.910      2.6%      1.2
triplex_line       1.645      2.2%      1.0
underground_line   0.715      1.0%      2.2
fuse               0.219      0.3%      1.6
meter              0.127      0.2%      1.2
load               0.093      0.1%      0.9
capacitor          0.062      0.1%     31.0
switch             0.032      0.0%      1.5
================ ======== ======== ========
Total             74.056    100.0%      5.9

,

nikhilgupta10 commented 7 years ago

nikhilgupta10 imported these comments from Sourceforge: "dchassin": * status changed from new to accepted

The problem is related to the use of clock() as the basis for timing. The clock() function evaluate the cpu time used, not wall clock time elapsed. Recommend using a new exec_clock() function to evaluate elapsed wall clock time instead.

Another suggestion is to evaluate both times so that performance can be evaluated more carefully with respect to both wall clock and cpu time.

,

"dchassin": * owner changed from dchassin to jcfuller

The main problem is addressed by changeset:3487. We need to do manual inspection of profile reports to verify that this has resolved the problem. If the fix is confirmed, please reassign the ticket back to dchassin to address the second issue (outputting both cpu and wall time).

,

"dchassin": * priority changed from validate to minor

Diff: