ipeaGIT / geobr

Easy access to official spatial data sets of Brazil in R and Python
https://ipeagit.github.io/geobr/
786 stars 118 forks source link

Distortions in tracts in 2000 #308

Open zehbrandao opened 1 year ago

zehbrandao commented 1 year ago

Hello, there!

As in #275, the problem seems to be with the source data. The shapes for Maricá, Saquarema and Araruama, in Rio de Janeiro seem misplaced, distorted and with some overlapping (note that Niterói and São Gonçalo seem alright). I was not sure whether to report this or not, but @rafapereirabr said in #275 that "one of the benefits of geobr is precisely to get rid of these problems and make a clean version of the data easily available", so it seemed reasonable to do it.

In the meantime, is it ok to ask what do you think has caused this? Could it be just some projection issue?

Regards,

image

rafapereirabr commented 1 year ago

Hi @zehbrandao , thanks for opening this issue. Can you please share the code you used to create this map? I cannot reproduce this.

The code I used is this, and it shows a really good alignment between census tracts and the background tiles.

library(geobr)
library(mapview)
mapviewOptions(platform = 'leafgl')

# download data
a <- read_census_tract(code_tract = 33, simplified = F)

mapview(a)
Screenshot 2023-03-09 214634 Screenshot 2023-03-09 214728
zehbrandao commented 1 year ago

Yeah, I should've provided clearer details, sorry about that!

The thing is, I'm using Python (3.9) and the base year is 2000. I'd say the problem concerns the latter. Anyway, I'm attaching a snippet below that is still producing (in here) the issue I reported.

import geobr

tracts = geobr.read_census_tract(
    code_tract=33,
    year=2000,
    simplified=False,
    ).to_crs(epsg=31983)

tracts.explore()

Thanks!

rafapereirabr commented 1 year ago

Thanks. I've now been able to find the same issue with the code below.

library(geobr)
library(mapview)
mapviewOptions(platform = 'leafgl')

# download data
a <- read_census_tract(code_tract = 33, simplified = F, year=2000)

mapview(a)

There seems to be something really strage with the census tracts of the 2000 census, indeed. It looks like it is a problem with the data projection, but it could be more than that. It this is simply a projection issue, we can probably fix it more easily. We just need to know the projection of the official data and then transform to SIRGAS 2000 epsg (SRID): 4674, wich is the official projection used by IBGE and all other data sets in geobr.

This is where the original data comes from. Could you please check if you have the same problem with the original data file?

zehbrandao commented 1 year ago

Hi!

I tried opening the Saquarema file (id 3305505) with QGIS and for some reason the shapes are projected in EPSG:32623 (UTM Zone 23N), which causes them to appear in the middle of Greeland. If I force them to EPSG:31983 (23 S), it approaches the place where it should be, but in the manner of my first post.

rafapereirabr commented 1 year ago

I had the same experience. So, we can conclude the projection of the original data set is pretty bad. The solution we adopted in geobr comes close to a good solution, but the the data for 2000 census tracts still have projection issues. At the moment, I am not sure what we could do, but I'm open to suggestions on how we could fix this.

rafapereirabr commented 1 year ago

I've now talked to our colleagues at IBGE and the data is indeed a bit patchy. It might be possible for us to 'reconstruct' the census tracts of the 2000 census using the "AMC" approach developed by @lucasmation, but I can only look into this later this year.