ipeaGIT / geobr

Easy access to official spatial data sets of Brazil in R and Python
https://ipeagit.github.io/geobr/
778 stars 116 forks source link

Suggestion: input code_* to more read_*() functions #338

Closed baarthur closed 3 months ago

baarthur commented 6 months ago

Following the same logic of read_municipality(), read_weighting_area() and so on, I think other functions ---especially those with heavy datasets--- should include this same functionality, so that we don't have to download the entire dataset prior to filtering it. I provide below an example of my current workflow:

library(sf)
library(geobr)
library(dplyr)

shp_bhz <- read_municipality(3106200)

shp_footprint <- read_urban_area() %>% 
  dplyr::filter(code_muni %in% shp_bhz$code_muni) # %in% for either a single or multiple cities

shp_rm <- read_metro_area() %>% 
  filter(abbrev_state %in% c("MG", "ES"))

In addition, it would also be nice if standard code_* accepted a vector of codes/names instead of a single value,so that e.g. in the last bit of code above I could simply do shp_rm <- read_metro_area(code_state = c(31,32))

rafapereirabr commented 6 months ago

Hi @baarthur . A few quick clarifications:

  1. When users run read_municipality(), the function downloads all muncipalites of state 31 and then filters the observation. Most large data sets in are stored with a separate file for each state in geobr. This is the case for example with municipalities, census tracts and weighting areas, and this is to speed online data transfer.
  2. However, most data sets are fairly small, so they are stored in a single file for the whole country. This is the case for metropolitan regions and urban areas.

I could include the code_state parameter to the read_urban_area() and read_metro_area() functions to make them more convenient, but the functions would still download the data sets for the entire country (which are quite small , actually)

ps. I'm planning a different caching system for geobr that will keep data sets chached across R sessions. This will means each data set will only be downloaded once and will radically improve users' experience

baarthur commented 6 months ago

okay, got it! makes sense to be that way then.

looking forward for the new implementations

rafapereirabr commented 3 months ago

This request has now been implemented as a test for the read_metro_area() function in the dev version of R. Please let me know if you find any bugs.

# filter by state code
test_sf <- read_metro_area(code_state = 33)

test_sf <- read_metro_area(code_state = c(33, 35))

# filter by state abbrev
test_sf <- read_metro_area(code_state = 'RJ')

test_sf <- read_metro_area(code_state = c('RJ', 'SP'))
baarthur commented 3 months ago

working smoothly here, thanks!

rafapereirabr commented 3 months ago

this has now been implemented in the read_urban_area() as well. It already works in the dev version and will be shipped in the next release on CRAN.

# filter by state code
test_sf <- read_urban_area(code_state = 33)

test_sf <- read_urban_area(code_state = c(33, 35))

# filter by state abbrev
test_sf <- read_urban_area(code_state = 'RJ')

test_sf <- read_urban_area(code_state = c('RJ', 'SP'))