ipeaGIT / acesso_oport

IPEA - Projeto Acesso a Oportunidades
https://www.ipea.gov.br/acessooportunidades/
51 stars 20 forks source link

Add new income data to the census tracts #72

Closed kauebraga closed 3 years ago

kauebraga commented 3 years ago

From dataset Entorno04

kauebraga commented 3 years ago

Added at https://github.com/ipeaGIT/acesso_oport/commit/ab4c28a06202d56a1f6eafb5eb158261f5f5c07d, but sum of moradores by income brackets is short of moradores variables. They were supposed to be the same (both are "moradores em domicílios particulares permanentes")

kauebraga commented 3 years ago

This information on Census' documentation explains:

As informações do Entorno foram coletadas para os setores urbanos. Alguns setores de aglomerados sub-normais não foram coletados. A Cobertura foi de mais de 96%. Os setores onde não houve coleta, das informações do entorno, são aqueles que nos arquivos entorno01, entrono02, entorno03, entorno04 e entorno05 apresentam valor zero para todas as informações.

=(

rafapereirabr commented 3 years ago

96% é muita coisa. Mas é importante ter um diagnostico de mais preciso de quanto perdemos. Sugiro calculara diferença entre estimativas pop pelos dois arquivos para cada hexagono. Vamos ver essa distribuição pra ver como fica (tanto como summary como mapa)

kauebraga commented 3 years ago

x axis: Breaks of difference moradores_total - sum(moradores by SM) y axis: count of each break Only for tracts included in project municipality Close to 90% of the tracts are fine (moradores_total = sum(moradores by SM))

image

kauebraga commented 3 years ago

Taking a sample of the biggest cities, where x axis represents the % of each dif category image

Salvador, Recife and Rio are the most problematic. Is this a pattern because of aglomerados sub-normais?

dhersz commented 3 years ago

As I understand the figure, around 5%~ (or so) of census tracts in Recife and Salvador have "lost" more than 1000 inhabitants when we consider the Entorno04 variables (and around 25% have lost 1 or more inhabitants), is that how you'd read this?

kauebraga commented 3 years ago

yes!

rafapereirabr commented 3 years ago

Well done. @kauebraga , could you please generate the map of the hexagons in Rio showing where these differences occur?

kauebraga commented 3 years ago

image

rafapereirabr commented 3 years ago

On a visual inspection, it seems there is a big overalp with the distribution of favelas, which is expected.

Capture
rafapereirabr commented 3 years ago

The thing is, this is the most appropriate variable to get detailed information about populations' income at a high spatial resolution (number of people living in households with a household income per capita between $ and $$). However, this variable is missing for some census tracts with informal housing. This gives rise to two issues: First, this variable underestimates the most vulnerable population of low-income. While this is not a big issue for most cities, this can be particularly problematic for Salvador, Recife and to a lesser extent Rio.

As I understand the figure, around 5%~ (or so) of census tracts in Recife and Salvador have "lost" more than 1000 inhabitants when we consider the Entorno04 variables (and around 25% have lost 1 or more inhabitants), is that how you'd read this?

The second issue is that variable might also throw in some bias in the spatial distribution of low-income families. In the case of Rio (figure bove), these data seems to understiamte low-income families more in the east and northeart portions of the city (where accessibility levels tend to be greater). So it's important to keep in mind that this bias underlying population+income data will also influence any analysis that tries to estimate accessibility inequalities between income levels.

@kauebraga , could you please generate similar maps for Salvador and Recive? just so we get a sense of potential spatial biases in these cities as wel>?

kauebraga commented 3 years ago

image

image

rafapereirabr commented 3 years ago

Taking a sample of the biggest cities, where x axis represents the % of each dif category image

Salvador, Recife and Rio are the most problematic. Is this a pattern because of aglomerados sub-normais?

@kauebraga , could please generate a table with the absolute and relative number of people that we miss out by using this variable?

rafapereirabr commented 3 years ago

We decided this variable will not be included in the data because of missing infomation for residents in 'algomerados subnormais'. It's a shame, but we hope we might be able to get better data in the next population census