mrc-ide / covid-sim

This is the COVID-19 CovidSim microsimulation model developed by the MRC Centre for Global Infectious Disease Analysis hosted at Imperial College, London.
GNU General Public License v3.0
1.23k stars 256 forks source link

Relationship between admin units, bounding box, and 'valid' microcells #374

Closed kevinminors closed 4 years ago

kevinminors commented 4 years ago

Hello all,

I am rebuilding parameter files for this model and I can't figure out the relationship between admin units, bounding box, and 'valid' microcells. I am having a few issues surrounding this.

I'll include the model output and population density file data as I think it may be relevant. Here is the output from the model:

Update step = 0.250000 Sampling step = 1.000000 Updates per sample=4 TimeStepsPerDay=4.000000 Using 1 administrative units Parameters read Scanning population density file Adjusted bounding box = (-64.725, 32.325)- (-64.425, 32.625) Number of cells = 16 (4 x 4) Population size = 63779 Number of microcells = 1296 Bitmap width = 20 Bitmap height = 28 Coords xmcell=781.258 m ymcell = 926.107 m Density file contains 9 datapoints. 1 valid microcells read from density file. Binary density file should contain 1 microcells. Saving population density file with NC=1... Population files read. [C:\Users\KevinMinors\covid-sim\src\Rand.cpp line 1037] N < 0 in IGNBIN

The population density file data is:

-64.683118 32.38194 5659 60 60100 -64.728963 32.342663 5584 60 60200 -64.730012 32.316171 5984 60 60300 -64.75215 32.30085 7087 60 60400 -64.783649 32.299971 11160 60 60500 -64.780009 32.27843 5899 60 60600 -64.807496 32.267312 9002 60 60700 -64.860018 32.252738 6421 60 60800 -64.869676 32.296957 6983 60 60900

I imagine the issue here is something around trying to adjust the parameters for a smaller country. And somehow, only having 1 microcell is causing problems for the binomial distribution in Rand.cpp.

Firstly, why is only one admin unit being used? I've got 9 there in the population density file

Secondly, what makes a microcell 'valid'? What is the relationship between this and the bounding box? What is the relationship between microcells and admin units?

Finally, I'm using the UK parameter files as a foundation. How do I need to adjust them to apply the model to a significantly smaller country?

Forgive me if I've missed this info somewhere or if the answer is really obvious. I'm new to C++ and I couldn't quite find the answer in the code. Thanks!

weshinsley commented 4 years ago

As a bit of a pre-warning: if you're new to spatial/epi modelling, and new to C++, it maybe that you'd need more support than we have capacity for at the moment. It may be more useful to you to explore the current working runs, than trying to create your own theoretical one from the ground up - which is what I think you're trying to do? If so, start with the existing ones; we're working on more documentation, but our priority is live modelling at the moment.

As an immediate observation, what's the source of your population data? Not sure what grid alignment it has, but it does not seem consistent with standard worldpop or landscan.

kevinminors commented 4 years ago

Yes, I've explored the UK sample run. It was really helpful. I'm now trying to build a run for Bermuda.

Ah, interesting. I didn't know the grid alignment had to be consistent with worldpop or landscan. Is there anything else you can tell me about the requirements for building the population density file?

weshinsley commented 4 years ago

For that, you'd need to get shapefiles for Bermuda (eg, www.gadm.org), at the level of admin detail that you want. You'd then want to mash those with spatial population data (eg, worldpop), to give you a list of cells by lower-left longitude/latitude, the population in each cell, and the admin unit which that cell (primarily) occupies.

kevinminors commented 4 years ago

Ok, I've got the shape files for the appropriate admin unit detail level from gadm.org and I have the .tif file from worldpop for the Bermuda spatial population data. How exactly do I 'mash' them together? What software do I need to do this?

bbolker commented 4 years ago

you're probably going to need a crash course in spatial data handling. GDAL and GEOS are the two main libraries for handling geospatial data; they're open source and there are bindings in many languages (Python, R ...) (unfortunately I can't tell you exactly what to do)

kevinminors commented 4 years ago

Ok, I'm happy to take a course. Which course should I take?

Was GDAL or GEOS used to join the shape files and the spatial population data into the correct format mentioned of "a list of cells by lower-left longitude/latitude, the population in each cell, and the admin unit which that cell (primarily) occupies"?

Why can't you tell me exactly what to do?

weshinsley commented 4 years ago

It's a basic GIS procedure - your worldpop data is a grid of numbers telling you the population in each cell of (1/120) lon/lat. The shape files give you polygons for where the boundaries are. What you want, is to mash the two together, so that (a) You work out what pixels are inside the shapes for the country you want to model, and then (b) For each of those pixels, you look up what admin unit those pixels are in.

I wrote some Java code to do this, as Shapefiles are easy to parse, and then I used Java's basic polygon classes to rasterise the shapes onto a grid. Where one or more shape lines cut through a pixel, I sample with higher resolution to see how many subsamples are in each competing shape, and then assume the pixel is the "most popular" shape (which is the necessary approximation of rasterising).

I will try and get that into a more generic form for the repo at some point - although it is by no means the only (or likely best) way to do it. I had forgotten worldpop comes as a TIF; I find that an awkward format to work with. GDAL is a good call - gdal_translate ppp_2020_1km_Aggregated.tif worldpop2020global.bil will convert worldpop to a simple binary grid (BIL) and a metadata file (HDR)

But can I politely repeat, as per the readme, these are support/training questions we don't have capacity for at the moment. More documentation is coming, but I don't want you to have the expectation that I can step you through building an entire epi model of Bermuda via a github issue, as there will be plenty more questions to come I am sure.

weshinsley commented 4 years ago

For example... wpop_bmu_adm1.txt wpop_bmu_adm1_index.txt

kevinminors commented 4 years ago

What an absolute legend! You're a hero @weshinsley! Thank you! I'll run these over the next couple days. I really appreciate this! Thanks again!

weshinsley commented 4 years ago

No probs... (the "s" in the last line of the index file is a typo...)

kevinminors commented 4 years ago

Just as a heads up, I don't think the model likes spaces in the admin unit names. For example, I had to replace 'Hamilton Municipality' with 'Hamilton_Municipality' because it would stop reading the admin units once it saw the space.

kevinminors commented 4 years ago

It took me way too long to figure this out ahahahaaha