Closed kauebraga closed 3 years ago
Added at https://github.com/ipeaGIT/acesso_oport/commit/ab4c28a06202d56a1f6eafb5eb158261f5f5c07d, but sum of moradores by income brackets is short of moradores
variables. They were supposed to be the same (both are "moradores em domicílios particulares permanentes")
This information on Census' documentation explains:
As informações do Entorno foram coletadas para os setores urbanos. Alguns setores de aglomerados sub-normais não foram coletados. A Cobertura foi de mais de 96%. Os setores onde não houve coleta, das informações do entorno, são aqueles que nos arquivos entorno01, entrono02, entorno03, entorno04 e entorno05 apresentam valor zero para todas as informações.
=(
96% é muita coisa. Mas é importante ter um diagnostico de mais preciso de quanto perdemos. Sugiro calculara diferença entre estimativas pop pelos dois arquivos para cada hexagono. Vamos ver essa distribuição pra ver como fica (tanto como summary como mapa)
x axis
: Breaks of difference moradores_total - sum(moradores by SM)
y axis
: count of each break
Only for tracts included in project municipality
Close to 90% of the tracts are fine (moradores_total = sum(moradores by SM))
Taking a sample of the biggest cities, where x
axis represents the % of each dif category
Salvador, Recife and Rio are the most problematic. Is this a pattern because of aglomerados sub-normais?
As I understand the figure, around 5%~ (or so) of census tracts in Recife and Salvador have "lost" more than 1000 inhabitants when we consider the Entorno04
variables (and around 25% have lost 1 or more inhabitants), is that how you'd read this?
yes!
Well done. @kauebraga , could you please generate the map of the hexagons in Rio showing where these differences occur?
On a visual inspection, it seems there is a big overalp with the distribution of favelas, which is expected.
The thing is, this is the most appropriate variable to get detailed information about populations' income at a high spatial resolution (number of people living in households with a household income per capita between $ and $$). However, this variable is missing for some census tracts with informal housing. This gives rise to two issues: First, this variable underestimates the most vulnerable population of low-income. While this is not a big issue for most cities, this can be particularly problematic for Salvador, Recife and to a lesser extent Rio.
As I understand the figure, around 5%~ (or so) of census tracts in Recife and Salvador have "lost" more than 1000 inhabitants when we consider the
Entorno04
variables (and around 25% have lost 1 or more inhabitants), is that how you'd read this?
The second issue is that variable might also throw in some bias in the spatial distribution of low-income families. In the case of Rio (figure bove), these data seems to understiamte low-income families more in the east and northeart portions of the city (where accessibility levels tend to be greater). So it's important to keep in mind that this bias underlying population+income data will also influence any analysis that tries to estimate accessibility inequalities between income levels.
@kauebraga , could you please generate similar maps for Salvador and Recive? just so we get a sense of potential spatial biases in these cities as wel>?
Taking a sample of the biggest cities, where
x
axis represents the % of each dif categorySalvador, Recife and Rio are the most problematic. Is this a pattern because of aglomerados sub-normais?
@kauebraga , could please generate a table with the absolute and relative number of people that we miss out by using this variable?
We decided this variable will not be included in the data because of missing infomation for residents in 'algomerados subnormais'. It's a shame, but we hope we might be able to get better data in the next population census
From dataset
Entorno04