ipeaGIT / geobr

Easy access to official spatial data sets of Brazil in R and Python
https://ipeagit.github.io/geobr/
789 stars 118 forks source link

Add weighting area column to the 2010 census tracts data #262

Open andre-al opened 2 years ago

andre-al commented 2 years ago

It would be useful to add the weighting area that each census tract is in to the census tract data, especially since this is a bit buried and hard to find.

You can find it on the Documentação.zip zip file on IBGE's own website, under "Documentação/Áreas de Ponderação/Composição das Áreas de Ponderação.txt", in what is essentially a tsv tabulated format with Weighting Area and Census Tract columns.

There are some encoding issues I didn't try to solve programmatically but you might have to to include it: the whole thing seems to be on UTF-16 and the special characters are broken in my computer. My own-use solution was extracting, navigating the folders with broken names on explorer, opening the file on notepad and Save as with UTF-8 encoding in another folder. Then, readr's read_tsv function reads it normally.

rafapereirabr commented 2 years ago

Thanks @alcantara-pereira . This is very helpful. Adding the code_weighting column to the census tract data should be fairly simple. We only need to do some filter/merge operations in the prep_census_tract.R script. The prep_census_tract.R script is intended to be fully reproducible, and it basically downloads the original data from IBGE and prepare the data tha we make available through geobr.

I will try to do this next month, but please feel free to suggest the necessary changes in the script and create a pull request.