Closed carlhiggs closed 1 year ago
@xavidelclos @marcdmallafre I have just drafted the implementation for using custom vector data, tested using the 2021 Catalunya population data with demographic strata from Idescat (https://biblio.idescat.cat/publicacions/Record/21104 ; the download link was working again, so used this). I think it works quite well! There is a small caveat for population density indicators in the current implementation. Will be keen to hear your thoughts when you get a chance to read the details below.
So, as per #298 I've also allowed for all datasets to be defined in the region configuration yml directly as an alternative to using datasets.yml to define shared datasets. So, the configuration files for a demographically stratified analysis of indicators for Tarragona, Catalunya are below. These are used to specify both the source data and a field to be used for the population estimate. To allow for comparisons while using data, the relevant field is renamed to 'pop_est' and the population layer itself is renamed to reflect an alias for the vector population data and the configured field used for estimates (e.g. population_catalunya_2021_p_15_64
)
So, the key difference really is vector_population_data_field: P_15_64
is changed to vector_population_data_field: P_65_I_MES
, although I also changed some of the comments and prose bits to reflect this when reporting.
Here's an example of the summary of differences when viewed using the new local browser web app:
... now -- looking at this I start to see the limitations of this particular implementation when applying for sub-population analyses such as this:
Perhaps this is useful enough for now however to allow for stratified analyses, while noting some population-specific indicators need to have care taken with their interpretation as they relate to the population sub-group itself, not the broader population.
This is currently on the enhancements branch, if you want to give it a go.
In the commit linked above I added in optional specification of a population_denominator
variable that can be used when evaluating neighbourhood population density for stratified population sub-group analyses. This means that the overall population density is used, while the population of interest is used for weighting indicators for cohort sub-group specific estimates.
In the case of Tarragona for persons 15 to 64 vs 65 and older, this now appears more sensible with similar neighbourhood population density estimates overall. The difference reflects spatial variation in the sub-group population, while both groups used the overall population for evaluating the density. So, on average persons aged 15 to 64 live in neighbourhoods that would be appear slightly less walkable on average compared to persons aged 65 years and older -- although the differences are smaller, and both groups tend to live in the more walkable parts of the city. The older cohort have better access to large public open space, convenience stores and fresh food markets, based on the data used to identify these locations using OpenStreetMap. Consequently, the score for access to daily living amenities is also higher for the older group. Overall, the combined effect that older persons tend to live in more densely populated neighbourhoods with better access to amenities results in the difference in population weighted walkability scores. Not weighting for sub-group population, the spatial average of walkability is basically equivalent (which makes sense, given its the same city and using the same denominator population to evaluate density).
I think that's useful, now that we can specify that the analyses could be configured to use population_denominator: TOTAL
.
oh, here's an example of the spatial distribution of walkability estimates for persons 15 to 64, using the vector grid official statistics data instead of raster modelled population estimates data:
Carl, this looks great! It will be very useful. In the following weeks we could try to run it with other vector layers such as the census tracts. @marcdmallafre could try to run it for some of the Spanish cities if the current version of the software allows for it.
Hi @xavidelclos , the current main branch zip file should work with this now --- I'm waiting to make a few more changes before doing the next release, so it's not in the list of formal releases yet. If you give it a go @marcdmallafre let me know how it goes!
Currently population data is configured using a raster grid (eg Global Human Settlements Layer population data). This is vectorised and then used to take the average of sample point estimates, that are then further aggregated for the overall region as a population weighted average of the grid small area estimates.
Many countries, including Australia and Spain, supply population data using vector file formats for administrative areas --- more commonly than population grids. For example, if you want information on demographic sub-groups, or want to communicate using official statistical areas the raster grid may not be the best option. Also, some raster data products (like GHS-POP) are modelled estimates rather than direct reflections of census counts, and so again, in some instances official data may be preferred.
So, support for using administrative population data in a vector file format has been requested by some of our early adopters (including @xavidelclos).
In principle, a modification to the software to allow optional configuration of this alternate data format should be do-able --- e..g if a raster format is configured, analysis proceeds as is currently the case to develop a vectorised small area grid; alternately, if a vector format is configured, this is imported directly to serve as an equivalent small area vector grid (if not necessarily of equal areas).
There are currently some assumptions around field names and data structures that may have to be thought through with this modification, but in principle, it is do-able and may make it easier for creating population specific urban indicators, and for sensitivity analyses comparing gridded population data with official census data products distributed in vector formats.