ipeaGIT / geobr

Easy access to official spatial data sets of Brazil in R and Python
https://ipeagit.github.io/geobr/
786 stars 118 forks source link

geobr::read_municipality() in 2000 has duplicated municipality 3509908 #275

Closed MatthieuStigler closed 4 months ago

MatthieuStigler commented 2 years ago

This problem seems more to be on the source data side than the geobr package, but there is a very strange duplicated 3509908 municipality in 2000. It appears twice:

Ideally this could be repaired at the source, but if not, it might be good to add a warning in that case? It seems reasonable that the user can expect a unique municipality-year dataset?

Thanks!

library(dplyr, warn.conflicts=FALSE)
dat_2000 <- geobr::read_municipality(year = 2000, simplified = TRUE, code_muni = 3509908)
#> Loading required namespace: sf
#> Using year 2000
#> dat_2000
#> Simple feature collection with 2 features and 4 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -48.23922 ymin: -25.31183 xmax: -47.73522 ymax: -24.75571
#> Geodetic CRS:  SIRGAS 2000
#>     code_muni name_muni code_state abbrev_state                           geom
#> 113   3509908  Cananeia         35           SP MULTIPOLYGON (((-48.0035 -2...
#> 114   3509908  Cananéia         35           SP MULTIPOLYGON (((-48.23922 -...
sf::st_area(dat_2000)
#> Units: [m^2]
#> [1] 6.231309e+04 1.244208e+09

plot(sf::st_geometry(dat_2000[1,]))

plot(sf::st_geometry(dat_2000[2,]))

Created on 2022-02-08 by the reprex package (v2.0.1)

rafapereirabr commented 2 years ago

Hi @MatthieuStigler . Thank you for the heads up. The original data provided by IBGE somes with various issues like this one. One of the benefits of geobr is precisely to get rid of these problems and make a clean version of the data easily available. So thanks for pointing us to this issue. I'll address it in the next round of updates / corrections

MatthieuStigler commented 2 years ago

great, glad to hear you can address this at the geobr level at least! Thanks for all the good work :-)

for what I could see, doing add_count(muni_code, year) %>% filter(n>1) on a dataset row-binding all the data-output of mutate(data = map(year, ~geobr::read_municipality(year = ., simplified = TRUE))), where years were from 2000 to 2020 (except 2001), this was the only duplicate I found.

rafapereirabr commented 4 months ago

Hi @MatthieuStigler . This issue has now been fixed. I'm sorry it took a long time , but I believe this is now fixed for good. Please let me know if the problem persists or if you've found this or other similar issues elsewhere in the package

rafapereirabr commented 4 months ago

Oops; Reopening this issue because now there is an issue with column class incompatibility. I'll fix this tomorrow

df <- geobr::read_municipality(year = 2000)

Error in data.table::rbindlist(files, fill = TRUE) : Class attribute on column 8 of item 5 does not match with column 8 of item 1.

rafapereirabr commented 4 months ago

fixed

MatthieuStigler commented 4 months ago

muito obrigado for your great work Rafael!