Closed hidekb closed 3 years ago
@hidekb we had removed prange
from loops in deterministic.pyx as it was slowing down the code. This was especially a problem for inference and we got massive speedups by removing it when the contact matrix was 16x16, for example. So I would be very curious to check again as to how does the code perform now if you change the number of threads. Please can you confirm if prange indeed gives performance as # threads is changed for both small and large values of M?
@rajeshrinet I checked the performance of prange for small M(=16) contact matrix using ex03-age-structured-SIR-for-India.
Since the lambds calculation have double loop, I distinguish the outer loop and the inner loop. The results are following: outer range, inner range 1.15 ms ± 27.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) outer prange, inner range 1.15 ms ± 39.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) outer prange, inner prange 1.19 ms ± 35.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I consider that the code slowing down is caused by using double prange, though the prange is not helpful for this calculation.
By the way, the improvement of calculation rate for big M contact matrix without skipping zero elements is caused by the correct declare of variable infective_index. Thus I agrre removing prange.
By the way, the improvement of calculation rate for big M contact matrix without skipping zero elements is caused by the correct declare of variable infective_index. Thus I agrre removing prange.
@hidekb thanks very much for this useful benchmark. I am also of same view that we do not add prange to code if it is not giving any added performance. Please would you update your pull request accordingly while retaining changes to the computation of contact matrix? Thanks...
To improve computation rate using big contact matrix, I implement skipping zero elements in the contact matrix. Additionally, the loop of the lambdas calculation is parallelized and I defined the variable infective_index which used in the lambdas calculation. This feature can be turned on or off as constatnt_CM which is an argument of pyross.deterministic.Spp. If constatnt_CM=0, this skipping is turned off. The default value is constatnt_CM=0.
For London simulation including around 1000 node, the previous implementation takes 10 minutes. The present implementation takes 100 seconds without the skipping of zero elements. When the skipping of zero elements is turned on, the present implementation takes 20 seconds.
However, the implementation of PyrossGeo takes 2 seconds for the same simulation with using non-local interaction model.