Improve transmission performance

cliffckerr commented 2 months ago

PR #546 appears to slow down the sim by nearly 30% -- the benchmark takes 1.3 s to run instead of 1.0 s. Need to fix. Tried njit on ss.combine_rands(), doesn't help.

Compare SHA 4689698 (combine-rands, slow) with 43249e3 (main, fast).

cliffckerr commented 2 months ago

Script:

import sciris as sc
import starsim as ss

kw = dict(n_agents=10e3, start=2000, end=2100, diseases='sis', plot=False)

prof = 1

if prof:
    context = sc.cprofile(sort='selfpct', mintime=1e-1)
else:
    context = sc.timer()

with context:
    sim = ss.demo(**kw)

Slow:

                          func    cumpct  selfpct  cumtime    selftime  calls                                        path
0                     add_edge   14.6439  14.3256   0.1165  1.1399e-01  51710                         multidigraph.py:416
1                set_prognoses   20.3468   3.1097   0.1619  2.4744e-02    102                               disease.py:94
2               make_new_cases   49.8485   2.4438   0.3967  1.9446e-02    101                              disease.py:247
3                       append   16.5239   1.8800   0.1315  1.4959e-02  51710                              disease.py:377
4                          rvs   17.2986   1.8735   0.1376  1.4908e-02    611                        distributions.py:520
5             _handle_fromlist   13.5303   0.3817   0.1077  3.0370e-03   4897          <frozen importlib._bootstrap>:1207
6   load_additional_registries   15.4045   0.2278   0.1226  1.8130e-03    248                                   cpu.py:60
7                         step   73.3433   0.1977   0.5836  1.5733e-03    101                                  sim.py:167
8                set_prognoses   23.9644   0.1740   0.1907  1.3842e-03    102                                  sir.py:128
9                   _set_cases   24.0492   0.1045   0.1914  8.3182e-04    101                              disease.py:305
10                         run  100.5591   0.0824   0.8002  6.5575e-04      1                                  sim.py:234
11              _find_and_load   13.0682   0.0316   0.1040  2.5143e-04     88          <frozen importlib._bootstrap>:1165
12                      update   13.2567   0.0308   0.1055  2.4541e-04    101                              network.py:499
13     _find_and_load_unlocked   13.0037   0.0307   0.1035  2.4405e-04     76          <frozen importlib._bootstrap>:1120
14                     refresh   19.5396   0.0271   0.1555  2.1549e-04    248                                 base.py:261
15              _load_unlocked   12.7827   0.0221   0.1017  1.7560e-04     75           <frozen importlib._bootstrap>:666
16                 exec_module   12.6857   0.0142   0.1009  1.1322e-04     74  <frozen importlib._bootstrap_external>:934
17   _call_with_frames_removed   12.9244   0.0092   0.1028  7.2965e-05    202           <frozen importlib._bootstrap>:233
18                  initialize   25.3744   0.0060   0.2019  4.7907e-05      1                                   sim.py:47
19                     compile   20.2575   0.0057   0.1612  4.5254e-05      2                           dispatcher.py:907
20                        demo  100.7041   0.0036   0.8013  2.8979e-05      1                                  sim.py:706
21                    set_seed   22.5082   0.0027   0.1791  2.1499e-05      1                                utils.py:196
22           _compile_for_args   22.8517   0.0022   0.1818  1.7664e-05      2                           dispatcher.py:388
23               load_overload   20.2238   0.0012   0.1609  9.2750e-06      2                              caching.py:627

Fast:

                          func    cumpct  selfpct  cumtime    selftime  calls                                path
0                     add_edge   16.1580  15.8075   0.1182  1.1560e-01  52196                 multidigraph.py:416
1               make_new_cases   45.8463   5.1804   0.3353  3.7882e-02    101                      disease.py:247
2                set_prognoses   22.4226   3.4457   0.1640  2.5197e-02    102                       disease.py:94
3                       append   18.1924   2.0344   0.1330  1.4877e-02  52196                      disease.py:379
4                          rvs   15.1250   2.0188   0.1106  1.4763e-02    611                distributions.py:520
5             _handle_fromlist   14.3934   0.4099   0.1053  2.9977e-03   4897  <frozen importlib._bootstrap>:1207
6   load_additional_registries   16.4687   0.2482   0.1204  1.8148e-03    248                           cpu.py:60
7                         step   71.3403   0.2157   0.5217  1.5773e-03    101                          sim.py:167
8                set_prognoses   26.4412   0.1851   0.1934  1.3538e-03    102                          sir.py:128
9                   _set_cases   26.5112   0.1045   0.1939  7.6395e-04    101                      disease.py:307
10                         run  100.5779   0.0867   0.7355  6.3400e-04      1                          sim.py:234
11              _find_and_load   13.8990   0.0332   0.1016  2.4308e-04     88  <frozen importlib._bootstrap>:1165
12                      update   14.4932   0.0330   0.1060  2.4125e-04    101                      network.py:499
13     _find_and_load_unlocked   13.8288   0.0319   0.1011  2.3337e-04     76  <frozen importlib._bootstrap>:1120
14                     refresh   20.9855   0.0296   0.1535  2.1681e-04    248                         base.py:261
15                  initialize   27.1852   0.0181   0.1988  1.3251e-04      1                           sim.py:47
16   _call_with_frames_removed   13.7409   0.0092   0.1005  6.7158e-05    202   <frozen importlib._bootstrap>:233
17                     compile   21.7269   0.0064   0.1589  4.7079e-05      2                   dispatcher.py:907
18                        demo  100.7380   0.0038   0.7367  2.7974e-05      1                          sim.py:706
19                    set_seed   24.2063   0.0028   0.1770  2.0501e-05      1                        utils.py:198
20           _compile_for_args   24.6252   0.0025   0.1801  1.8288e-05      2                   dispatcher.py:388
21               load_overload   21.6890   0.0013   0.1586  9.4950e-06      2                      caching.py:627

RomeshA commented 2 months ago

Strange - as an aside, I think it would be good if we had a couple of different sims in the benchmark so that we can get a wider cross section of different use cases. That's particularly important because bottlenecks could arise in different places depending on things like the number of agents. In some cases it might also help to identify where slow-downs are coming from, if some benchmarks are affected more than others

starsimhub / starsim

Improve transmission performance #568