Various things to look at

akesandgren commented 4 years ago

This is a list of things that need to be looked at, it is by no means complete, so if anyone thinks of anything else just add a comment to this issue.

One of the goals is to be able to run with the full population of Sweden for at least 500 days. And to run multiple scenarios with varying parameters.

Strategies for parallelizing, both OpenMP and MPI if possible.
Memory footprint need to be reduced, this might very well include restructuring heavily. This needs to be done carefully so as to not disrupt Jasmines development.
Make the code more modular, needs to be done slowly and carefully and in constant contact with me or Jasmine so she can continue to develop the interactions.

zao commented 4 years ago

I have a feeling that in order to scale up distance-based interactions there could be some sort of efficient spatial lookup for queries of the "everyone within 40 km" type.

In a Cartesian coordinate system distance queries can be supported by spatial hashes and/or dense grid cells, while in a lat-long coordinate system distance computations are more involved and finding which parts of the coordinate system are relevant is harder.

pojeda commented 4 years ago

Maybe a precomputed matrix with the distances would be enough unless dynamics is introduced in the model such as in lattice-based ones.

weizhanguio commented 4 years ago

I agree we should have a precomputed matrix with distances because the code now has repeated and heavy computation there.

weizhanguio commented 4 years ago

Can we test the time usage for the code now and see what benefit we can get from our changes?

akesandgren commented 4 years ago

Current master is most likely working as intended, barring any outstanding bugs introduced by me. So you can start analysing it.

zao commented 4 years ago

@pojeda @weizhanguio A dense matrix of distances for a population of 10M would take a few hundred terabytes of memory depending on precision.

weizhanguio commented 4 years ago

@zao Depending how you implement, and what's the matrix. I guess reduce the computation it is feasible. In fact I guess this code should be memory-bound.

zao commented 4 years ago

The dense matrix would be everyone-to-everyone.

A full list of "nearby" people per person would be less demanding storage-wise but could take quite some time to compute as it's quadratic and in densely populated areas it would still be quite wide.

My current idea is to accelerate on-demand queries by segmenting up-front by bands of latitude as distances are easier to compute north-south, and then inside each band filtering out individuals that are too far west/east to matter.

pojeda commented 4 years ago

@zao I was thinking about computing the relevant distances just once and store them in a matrix. The full matrix would be huge as you mentioned. In the code, there is a cutoff of 40km. BTW, I wonder if this cutoff can capture the relevant interactions as the function f(d) is of the order 10^-3 at the cutoff which seems to be a significant value.

weizhanguio commented 4 years ago

Hi, anyone check the value "infect"? Always zero? commit 375aa0e. Any criteria to check correctness of modification? For example, value of some variables?

jasminegardner commented 4 years ago

Infect initializes to zero for each susceptible person and adds a small number when it finds a contact in a workplace, community, or household. It should be zero for most pairs if they are not in contact. That's where a pair list would definitely help because it is loop through all infectious people for each susceptible person even though the majority are not in contact.

Yes, the community distance should probably be greater than 40. Maybe 100km? It is in essence the maximum distance that we would assume two people would be in contact. I would have to look at the actual values of the function.

On Wed, Mar 25, 2020 at 10:57 AM weizhanguio notifications@github.com wrote:

Hi, anyone check the value "infect"? Always zero? commit 375aa0e https://github.com/kassonlab/covid19-epi/commit/375aa0e6669f75b8a9c403ec688bc8247964d954. Any criteria to check correctness of modification? For example, value of some variables?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kassonlab/covid19-epi/issues/5#issuecomment-603748263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIDXMFTH7FXAIJKXLYVT5ATRJHIPHANCNFSM4LSZA6MA .

-- Jasmine Gardner, PhD Postdoctoral Researcher Department of Chemistry - BMC Uppsala University

pojeda commented 4 years ago

At 100km f(d) ~ 6x10^-5 140km ~ 2x10^-5 650 km (roughly) distance from Umeå to Stockholm) f(d) ~ 2x10^-7
how about using a constant value for pair interactions beyond the cutoff?

jasminegardner commented 4 years ago

I'm fine with that but then we cannot split the population for parallelization, correct? Or I guess we could make it a constant based on the value of infected people outside of the distance range. This would have to be sufficiently small because if a person in Skåne is feeling the effects of everyone in Stockholm, their chance of infection would be very high despite a very low change of these people ever coming in contact. This solution is fine but let me chat with Peter about the correct cutoff value and protocol.

On Wed, Mar 25, 2020 at 12:40 PM pojeda notifications@github.com wrote:

At 100km f(d) ~ 6x10^-5 140km ~ 2x10^-5 650 km (roughly) distance from Umeå to Stockholm) f(d) ~ 2x10^-7 how about using a constant value for pair interactions beyond the cutoff?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kassonlab/covid19-epi/issues/5#issuecomment-603792963, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIDXMFUUK6WW6TSEEGCL2MDRJHURBANCNFSM4LSZA6MA .

-- Jasmine Gardner, PhD Postdoctoral Researcher Department of Chemistry - BMC Uppsala University

pojeda commented 4 years ago

one more comment, using a Taylor expansion for f(d), the function (4/x)^3 approximates well the original f(d) for large distances and it is less expensive. This is in case you want to compute distances on the fly. For instance, starting at 40km, the function (4/x)^3 differs from f(d) by 10^-6.

peterkasson commented 4 years ago

Yes, I have some thoughts on this. Let's chat today. (There's also the general idea of including just enough complexity but not too much.)

Best, --Peter

Peter Kasson, MD, PhD Associate Professor Departments of Molecular Physiology and Biological Physics and of Biomedical Engineering University of Virginia and Department of Cell and Molecular Biology, Uppsala University

On Wed, Mar 25, 2020 at 5:12 AM Jasmine Gardner notifications@github.com wrote:

I'm fine with that but then we cannot split the population for parallelization, correct? Or I guess we could make it a constant based on the value of infected people outside of the distance range. This would have to be sufficiently small because if a person in Skåne is feeling the effects of everyone in Stockholm, their chance of infection would be very high despite a very low change of these people ever coming in contact. This solution is fine but let me chat with Peter about the correct cutoff value and protocol.

On Wed, Mar 25, 2020 at 12:40 PM pojeda notifications@github.com wrote:

At 100km f(d) ~ 6x10^-5 140km ~ 2x10^-5 650 km (roughly) distance from Umeå to Stockholm) f(d) ~ 2x10^-7 how about using a constant value for pair interactions beyond the cutoff?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/kassonlab/covid19-epi/issues/5#issuecomment-603792963>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AIDXMFUUK6WW6TSEEGCL2MDRJHURBANCNFSM4LSZA6MA

.

-- Jasmine Gardner, PhD Postdoctoral Researcher Department of Chemistry - BMC Uppsala University

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kassonlab/covid19-epi/issues/5#issuecomment-603805444, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRWZNJS6TXTC5JT5Q5T743RJHYMHANCNFSM4LSZA6MA .

weizhanguio commented 4 years ago

Is it possible to use only "float" or only "double"? Mixing "float" and "double" leads to many tricky bugs.

jasminegardner commented 4 years ago

Yes, you can change everything to float. The double values aren't necessary anymore. They were when I was trying to find probabilities of very small numbers but it isn't an issue anymore.

On Thu, Mar 26, 2020 at 12:56 PM weizhanguio notifications@github.com wrote:

Is it possible to use only "float" or only "double"? Mixing "float" and "double" leads to many tricky bugs.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kassonlab/covid19-epi/issues/5#issuecomment-604388542, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIDXMFW7IXZKQRFK3XSJSOTRJM7HVANCNFSM4LSZA6MA .

-- Jasmine Gardner, PhD Postdoctoral Researcher Department of Chemistry - BMC Uppsala University

zao commented 4 years ago

Not directly pertaining to the factors computed but I would like to document it anyway:

The distance computation itself needs double precision inside as it's using the cheapest and least stable distance computation (topmost at https://en.wikipedia.org/wiki/Great-circle_distance#Formulae). It's quite moody at shorter distances in my spatial partitioning code.

jasminegardner commented 4 years ago

I stole the code off the internet. If there is a better way to calculate, we can change it. Please let me know.

On Thu, Mar 26, 2020 at 3:37 PM Lars Viklund notifications@github.com wrote:

Not directly pertaining to the factors computed but I would like to document it anyway:

The distance computation itself needs double precision inside as it's using the cheapest and least stable distance computation (topmost at https://en.wikipedia.org/wiki/Great-circle_distance#Formulae). It's quite moody at shorter distances in my spatial partitioning code.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kassonlab/covid19-epi/issues/5#issuecomment-604467464, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIDXMFWHS72IKLVRXCUAIZ3RJNSDNANCNFSM4LSZA6MA .

-- Jasmine Gardner, PhD Postdoctoral Researcher Department of Chemistry - BMC Uppsala University

weizhanguio commented 4 years ago

commit 6e4feb9: if change to float, output herehereherehereherepropbablilities for age doesn't sum to 1 within 0.000000 Bailing out on age initialization

akesandgren commented 4 years ago

Yes, don't touch the doubles yet. They need to be carefully changed to float if at all, it's probably better to use double all over. But we'll look into that once we've gotten initial production started and can start looking at optimizations more closely. I used a epsilon requiring double precision for the probability additive vector creation.

weizhanguio commented 4 years ago

I agree double is better. If mix float and double, it will be problematic for checking output.

pojeda commented 4 years ago

Hi,

here there is a profiling analysis of the current program, maybe this can be useful to optimize the functions and to parallelize the heavier ones.

covid19_profiling

weizhanguio commented 4 years ago

Hi, anyone test OpenMP code and time usage report?

pojeda commented 4 years ago

here there is a profiling analysis for the OpenMP code using 28 threads.

covid19_omp

RogerJL commented 4 years ago

I'm fine with that but then we cannot split the population for parallelization, correct? Or I guess we could make it a constant based on the value of infected people outside of the distance range. This would have to be sufficiently small because if a person in Skåne is feeling the effects of everyone in Stockholm, their chance of infection would be very high despite a very low change of these people ever coming in contact. This solution is fine but let me chat with Peter about the correct cutoff value and protocol. … On Wed, Mar 25, 2020 at 12:40 PM pojeda @.***> wrote: At 100km f(d) ~ 6x10^-5 140km ~ 2x10^-5 650 km (roughly) distance from Umeå to Stockholm) f(d) ~ 2x10^-7 how about using a constant value for pair interactions beyond the cutoff? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIDXMFUUK6WW6TSEEGCL2MDRJHURBANCNFSM4LSZA6MA . -- Jasmine Gardner, PhD Postdoctoral Researcher Department of Chemistry - BMC Uppsala University

Why simulate at individuals (Skåne) when group size and distance affects results? Shouldn't infection of someone non traveler in Skåne be infected by an infected person travelling habits (and simulating individuals traveling in the model)

jasminegardner commented 4 years ago

This is long resolved and no longer relevant.

kassonlab / covid19-epi

Various things to look at #5