starsimhub / starsim

Starsim disease modeling framework
http://starsim.org
MIT License
12 stars 5 forks source link

Debug EmbeddingNet performance #414

Open cliffckerr opened 5 months ago

cliffckerr commented 5 months ago

More than 10x slower than random

kaa2102 commented 4 months ago

The EmbeddingNet includes the assignment/marriage problem to pair male and female couples. This is facilitated by the scipy function scipy.optimize.linear_sum_assignment which uses the Hungarian algorithm with a time complexity of O(n^3). This is why the EmbeddingNet takes an extremely long time with large distance matrix inputs to the scipy.optimize.linear_sum_assignment when the number of agents exceeds 10,000.

Other algorithms and techniques like Jonker-Volgenant algorithm or the Simplex method have time complexity of O(n^2). However, the use of the scipy.optimize.linear_sum_assignment function is essential to minimizing stochastic noise.

File: network.py, class EmbeddingNet(), Function add_pairs()

dist_mat = spsp.distance_matrix(loc_m[:, np.newaxis], loc_f[:, np.newaxis])

ind_m, ind_f = spo.linear_sum_assignment(dist_mat)

A potential solution (rather than using a less efficient algorithm) is to partition the distance matrix to more efficient sizes to enhance performance.

kaa2102 commented 4 months ago

In diagnosing performance issue, I noticed that on simulation initialization, the EmbeddingNet generates a distance matrix that is 50x the size of subsequent sim runs. This line of code also hits the scipy.optimize.linear_sum_assignment() function.

`dist_mat = spsp.distance_matrix(loc_m[:, np.newaxis], loc_f[:, np.newaxis])

ind_m, ind_f = spo.linear_sum_assignment(dist_mat) `

Starsim 0.2.10 (2024-03-18) — © 2023-2024 by IDM
Initializing sim with 1000 agents
Distance Matrix Shape =(358, 404)
  Running 2000.0 ( 0/50) (0.54 s)  ———————————————————— 2%
Distance Matrix Shape =(8, 58)
Distance Matrix Shape =(9, 54)
Distance Matrix Shape =(9, 50)
Distance Matrix Shape =(18, 52)
Distance Matrix Shape =(10, 42)
Distance Matrix Shape =(5, 39)
Distance Matrix Shape =(7, 45)
Distance Matrix Shape =(10, 42)
Distance Matrix Shape =(9, 50)
kaa2102 commented 4 months ago

Additional run data with different n_agents values.

Starsim 0.2.10 (2024-03-18) — © 2023-2024 by IDM
Initializing sim with 5000 agents
1874 1879
  Running 2000.0 ( 0/50) (1.67 s)  ———————————————————— 2%
55 46
56 47
48 55
54 59
59 42
73 46
75 48
81 52
67 51

Starsim 0.2.10 (2024-03-18) — © 2023-2024 by IDM
Initializing sim with 2000 agents
744 763
  Running 2000.0 ( 0/50) (0.41 s)  ———————————————————— 2%
18 38
19 42
13 41
20 61
26 61
22 60
18 58
15 59
20 59