karimn / covid-19-transmission

0 stars 0 forks source link

Check how runtime changes adding more countries and subnational units #4

Closed karimn closed 4 years ago

karimn commented 4 years ago

Currently profiling how the runtime changes as we go from 10 to 100 subnational units.

wwiecek commented 4 years ago

Different areas seem to have different computational demands: I ran the script with 5 randomly chosen units and got 90 seconds the first time, 330 seconds the second time.

karimn commented 4 years ago

That's probably because of the number of observed days is not the same for each area. This is only meant to get a rough estimate. One alternative is to use prior prediction to generate synthetic data from a single area and use N draws as if they are N different locations.

wwiecek commented 4 years ago

That's probably because of the number of observed days is not the same for each area. This is only meant to get a rough estimate. One alternative is to use prior prediction to generate synthetic data from a single area and use N draws as if they are N different locations.

Ah, yes, I forgot about N days loop. Running for a random sample of 10 gave 440 secs, random sample of 20 gave 926 secs, so my current prior is that it is linear with time. N=40 is still running (I do this on my tiny laptop which has good CPU but with a lot of themal throttling so it's probably slow overall)

karimn commented 4 years ago

Yes, I can't wait to have that cluster to run things somewhere else.

This is what I have so far:

speed_10.log:Sampling: 804.461 sec elapsed speed_20.log:Sampling: 651.897 sec elapsed speed_30.log:Sampling: 2666.915 sec elapsed speed_40.log:Sampling: 4235.891 sec elapsed speed_50.log:Sampling: 3409.4 sec elapsed speed_60.log:Sampling: 6689.687 sec elapsed speed_70.log:Sampling: 8508.598 sec elapsed speed_80.log:Sampling: 6532.451 sec elapsed

wwiecek commented 4 years ago

N=40 gave 2137 secs.

All of these suggest the computation time will be linear (with big variation because not all regions are alike) in N regions.

karimn commented 4 years ago

../temp/speed_100.log:Sampling: 8770.373 sec elapsed ../temp/speed_10.log:Sampling: 804.461 sec elapsed ../temp/speed_20.log:Sampling: 651.897 sec elapsed ../temp/speed_30.log:Sampling: 2666.915 sec elapsed ../temp/speed_40.log:Sampling: 4235.891 sec elapsed ../temp/speed_50.log:Sampling: 3409.4 sec elapsed ../temp/speed_60.log:Sampling: 6689.687 sec elapsed ../temp/speed_70.log:Sampling: 8508.598 sec elapsed ../temp/speed_80.log:Sampling: 6532.451 sec elapsed ../temp/speed_90.log:Sampling: 10989.68 sec elapsed

wwiecek commented 4 years ago

Big variation but still linear, right?

BTW I just set up a laptop for running more models in the background (I remembered I had an old company laptop that's a bit broken but with good CPU so I put Ubuntu on it). If you want to just give me some jobs to run, lmk

karimn commented 4 years ago

Thanks, I have 12 cores on my laptop so I'm usually able to run three different runs simultaneously but it kind of prevents me from working on the model. I'll let you know.

Ok, I think there are a number of different issues here. I ran three different scenarios:

  1. US only with with 652 counties
  2. Italy (20 subnational) + Portugal (1 subnational)
  3. Argentina + Australia + Canada + Portugal, a group of countries with few subnational units.

For (1), there is something wrong with the model because some chains are way faster than others. I'm thinking of rerunning but using states as the "country" to see if this makes a difference. It will still be a huge run.

For (2), the model ran to completion with no problems but took ~32000 second to complete. I'm running now again with Italy only to check if just adding one country with a single subnational unit makes a huge difference.

For (3), the model ran to completion without a problem and took ~8000 seconds to complete. I'm incrementally adding countries.

wwiecek commented 4 years ago

I think 1 is a time sinkhole, in that it might be underlying data issues and we do not care about results for so many units until we can have the simpler models running. What do you think?

For 2 only running 2 countries is not great as the hierarchical model might be throwing it off. A few countries + one country with subnational data would seem to be a better test.

3 sounds great (still a tad slower than what we need, but hey, it runs) And this maybe also supports what I said in the previous paragraph?

karimn commented 4 years ago

I agree that (1) is a sinkhole, so I think I'll put this who bug on the back burner for now and focus on incrementally adding reasonably size countries. I'm going to open a separate bug for the US data issue to get to later.

I'm not yet sure how to compare (2) and (3) since (2) has more subnational units in total. I'll know more when we add Italy to scenario (3). My take on (2) is that the hierarchical model should not be throwing it off. If the data isn't enough the prior should keep results in a reasonable range.

karimn commented 4 years ago

7 Is the new bug for the US data.

karimn commented 4 years ago

Update on Italy + Portugal vs Italy only runtime:

karimn commented 4 years ago

Closing this. Two potential ways to improve the model's runtime is #3 and #5

wwiecek commented 4 years ago

BTW, given what you saw so far, do you think it's feasible to run all ~300 regions together at some point? (Even if it takes days.) Once we have a more rounded off model and want to report on it, we probably have to throw it all into a single hierarchical structure.

karimn commented 4 years ago

We could try that. I do need to provide a upper bound if I run this on the cluster server. I would guess with around 300 it could take a week!

wwiecek commented 4 years ago

We could try that. I do need to provide a upper bound if I run this on the cluster server. I would guess with around 300 it could take a week!

Does ~40 take 1 day or is it non-linear after all?

karimn commented 4 years ago

I can't remember actually. I only remember that the US with 50 sub-regions looked like it would take days. We would start something for a week. The cluster server should handle it in particular if all we're using are 4 cores.

On Fri, Jul 17, 2020 at 5:26 PM Witold Wiecek notifications@github.com wrote:

We could try that. I do need to provide a upper bound if I run this on the cluster server. I would guess with around 300 it could take a week!

Does ~40 take 1 day or is it non-linear after all?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/karimn/covid-19-transmission/issues/4#issuecomment-660344465, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABB55ZG6VDO2JMR4N55T7TR4C6YRANCNFSM4N3X4OEA .