Open cliffckerr opened 2 months ago
Script:
import sciris as sc
import starsim as ss
kw = dict(n_agents=10e3, start=2000, end=2100, diseases='sis', plot=False)
prof = 1
if prof:
context = sc.cprofile(sort='selfpct', mintime=1e-1)
else:
context = sc.timer()
with context:
sim = ss.demo(**kw)
Slow:
func cumpct selfpct cumtime selftime calls path
0 add_edge 14.6439 14.3256 0.1165 1.1399e-01 51710 multidigraph.py:416
1 set_prognoses 20.3468 3.1097 0.1619 2.4744e-02 102 disease.py:94
2 make_new_cases 49.8485 2.4438 0.3967 1.9446e-02 101 disease.py:247
3 append 16.5239 1.8800 0.1315 1.4959e-02 51710 disease.py:377
4 rvs 17.2986 1.8735 0.1376 1.4908e-02 611 distributions.py:520
5 _handle_fromlist 13.5303 0.3817 0.1077 3.0370e-03 4897 <frozen importlib._bootstrap>:1207
6 load_additional_registries 15.4045 0.2278 0.1226 1.8130e-03 248 cpu.py:60
7 step 73.3433 0.1977 0.5836 1.5733e-03 101 sim.py:167
8 set_prognoses 23.9644 0.1740 0.1907 1.3842e-03 102 sir.py:128
9 _set_cases 24.0492 0.1045 0.1914 8.3182e-04 101 disease.py:305
10 run 100.5591 0.0824 0.8002 6.5575e-04 1 sim.py:234
11 _find_and_load 13.0682 0.0316 0.1040 2.5143e-04 88 <frozen importlib._bootstrap>:1165
12 update 13.2567 0.0308 0.1055 2.4541e-04 101 network.py:499
13 _find_and_load_unlocked 13.0037 0.0307 0.1035 2.4405e-04 76 <frozen importlib._bootstrap>:1120
14 refresh 19.5396 0.0271 0.1555 2.1549e-04 248 base.py:261
15 _load_unlocked 12.7827 0.0221 0.1017 1.7560e-04 75 <frozen importlib._bootstrap>:666
16 exec_module 12.6857 0.0142 0.1009 1.1322e-04 74 <frozen importlib._bootstrap_external>:934
17 _call_with_frames_removed 12.9244 0.0092 0.1028 7.2965e-05 202 <frozen importlib._bootstrap>:233
18 initialize 25.3744 0.0060 0.2019 4.7907e-05 1 sim.py:47
19 compile 20.2575 0.0057 0.1612 4.5254e-05 2 dispatcher.py:907
20 demo 100.7041 0.0036 0.8013 2.8979e-05 1 sim.py:706
21 set_seed 22.5082 0.0027 0.1791 2.1499e-05 1 utils.py:196
22 _compile_for_args 22.8517 0.0022 0.1818 1.7664e-05 2 dispatcher.py:388
23 load_overload 20.2238 0.0012 0.1609 9.2750e-06 2 caching.py:627
Fast:
func cumpct selfpct cumtime selftime calls path
0 add_edge 16.1580 15.8075 0.1182 1.1560e-01 52196 multidigraph.py:416
1 make_new_cases 45.8463 5.1804 0.3353 3.7882e-02 101 disease.py:247
2 set_prognoses 22.4226 3.4457 0.1640 2.5197e-02 102 disease.py:94
3 append 18.1924 2.0344 0.1330 1.4877e-02 52196 disease.py:379
4 rvs 15.1250 2.0188 0.1106 1.4763e-02 611 distributions.py:520
5 _handle_fromlist 14.3934 0.4099 0.1053 2.9977e-03 4897 <frozen importlib._bootstrap>:1207
6 load_additional_registries 16.4687 0.2482 0.1204 1.8148e-03 248 cpu.py:60
7 step 71.3403 0.2157 0.5217 1.5773e-03 101 sim.py:167
8 set_prognoses 26.4412 0.1851 0.1934 1.3538e-03 102 sir.py:128
9 _set_cases 26.5112 0.1045 0.1939 7.6395e-04 101 disease.py:307
10 run 100.5779 0.0867 0.7355 6.3400e-04 1 sim.py:234
11 _find_and_load 13.8990 0.0332 0.1016 2.4308e-04 88 <frozen importlib._bootstrap>:1165
12 update 14.4932 0.0330 0.1060 2.4125e-04 101 network.py:499
13 _find_and_load_unlocked 13.8288 0.0319 0.1011 2.3337e-04 76 <frozen importlib._bootstrap>:1120
14 refresh 20.9855 0.0296 0.1535 2.1681e-04 248 base.py:261
15 initialize 27.1852 0.0181 0.1988 1.3251e-04 1 sim.py:47
16 _call_with_frames_removed 13.7409 0.0092 0.1005 6.7158e-05 202 <frozen importlib._bootstrap>:233
17 compile 21.7269 0.0064 0.1589 4.7079e-05 2 dispatcher.py:907
18 demo 100.7380 0.0038 0.7367 2.7974e-05 1 sim.py:706
19 set_seed 24.2063 0.0028 0.1770 2.0501e-05 1 utils.py:198
20 _compile_for_args 24.6252 0.0025 0.1801 1.8288e-05 2 dispatcher.py:388
21 load_overload 21.6890 0.0013 0.1586 9.4950e-06 2 caching.py:627
Strange - as an aside, I think it would be good if we had a couple of different sims in the benchmark so that we can get a wider cross section of different use cases. That's particularly important because bottlenecks could arise in different places depending on things like the number of agents. In some cases it might also help to identify where slow-downs are coming from, if some benchmarks are affected more than others
PR #546 appears to slow down the sim by nearly 30% -- the benchmark takes 1.3 s to run instead of 1.0 s. Need to fix. Tried
njit
onss.combine_rands()
, doesn't help.Compare SHA 4689698 (
combine-rands
, slow) with 43249e3 (main
, fast).