Closed geohacker closed 6 years ago
@moradology @jpolchlo - could you drop some overall outline of estimation approach here? Thank you!
This task requires building a model from worldpop and a building density estimate derived from OSM data (perhaps with participation from other data sources).
The current thinking is that a linear regression relating density to worldpop will be sufficient. This estimator would be built by looking at vector tiles covering areas that are known to be reasonably complete, thereby producing reliable estimates of building density that can then be compared to the worldpop raster values. Over each known-good vector tile, we would generate a building density raster at the same resolution as worldpop, and take the corresponding pixel values in these input rasters (incuding any supplemental inputs) as samples in our model construction.
Initially, we were assuming that it would be necessary to take worldpop as the dependent variable, due to errors in measurement. Taking worldpop as an independent variable to predict building density would deflate the predicted building densities.
We also assume that building use patterns differ based on surroundings. Demand for space in urban areas suggests that there will be fewer square feet of living space per individual in those environments than in more agrarian areas. We believe that a land usage raster input would therefore be a valuable addition to the model, though this may prove not to be the case.
Thus, at present, our best guess of a model is
WORLDPOP = beta_1 * DENSITY + beta_2 * LANDCOVER + error
However, it is understood that density is a more useful dependent variable, so flipping the order of worldpop and density is worth trying, with the caveat that the approximate nature of worldpop can lead to underestimates of density.
And, to be sure, other methodologies besides linear regression are worth investigating as well. But the nature of the problem statement suggests that we try the simplest approaches first.
After having a conversation with another analyst here, I can say that we have a better notion of how this process is likely to work.
I expressed a concern in an earlier meeting about the uniformity of worldpop relative to the detail of the building density raster that I've generated, as seen here:
The lack of granularity in worldpop implies that we can only generate meaningful analyses over large aggregate areas, and not at the pixel level.
The process as I am currently thinking of it is to aggregate over areas of interest (regions to be provided by the consumer). The actual expected population (ACTUAL) can be computed directly from worldpop by summing the contributions of the cells of that raster which intersect the AOI. After estimating the model coefficients described in my last entry, they can be applied either directly to the building footprints or the building density raster over the same AOI to develop an approximate expected population (APPROX) count for the area of interest.
The ratio of ACTUAL/APPROX should be near 1 if the AOI has good OSM coverage; values substantially below 1 indicate poor coverage; and values substantially above above 1 indicate good coverage of an area which has experienced population growth since the last worldpop estimate.
In terms of effecting this model, some training data is required. Notably, a list of (zoom, x, y) triples corresponding to the TMS id's of vectortiles which have relatively complete OSM coverage will be needed per country. The longer the list, the better the estimate is likely to be.
A second step could be effected by also providing a list of tiles and a rough idea of how complete the building coverage is in order to calibrate the ranges of the ACTUAL/APPROX ratio. This is a "nice to have", and could easily be calibrated when selecting, say, color maps during the UI design process.
The lack of granularity in worldpop implies that we can only generate meaningful analyses over large aggregate areas, and not at the pixel level.
Yes, this is the exact recommendation from WorldPop. đź‘Ť
Agreed with the concern with WorldPop. They are working on improved methods through a new population grid project that is in the works. This new grid work also be a global model so we may not have to do country by country calculations but can calculate a global layer.
@sajjad and I talked about a couple things:
At present, it makes sense to produce a GeoJSON file per country that contains features for each TMS tile at zoom 12. For frame of reference, that grid will appear roughly at the following resolution:
The model currently involves computing, for each of these tile regions:
For some collection of training tiles—tiles which have some OSM buildings defined, with few to no buildings not captured in OSM—we can take the above measurements and perform a least-squares fit of the model
TOTALPOP = beta1 * BUILDAREA + beta2
This is possibly overkill for this simple model, but it will be a bit future-proof if we need to add more variables down the line to improve accuracy (and it's not too complicated to do).
Given this estimate, we have a means to predict the population that is expected for some tile. Call it ESTPOP.
For each tile in the output GeoJSON, then, we can provide the following attributes:
This last ratio is the key: This value, in a perfect world, would be 1. In poor coverage areas, it will be less. This ratio should determine a heatmap for display. The specific mapping of colors to values will have to be determined empirically.
Note that this ratio will be more unstable in low-population areas, so some adjustment may be required to get a more reliable metric.
Another thing worth pointing out that I'm sure will be visible in the final output is that WorldPop in Botswana will often distribute population over farming land at the same, or close to the same, rate as it does over cities:
The region on the right has density of 3.72 people/pixel and region on the left has density of 2.19 people/pixel. However:
and zooming in:
Building estimates are on S3! s3://hotosm-population/prediction
We're working with Azavea to build building estimates using WorldPop data as a signal. Still waiting to kick off this work.
cc @hotosm/osma-health