ipeaGIT / geobr

Easy access to official spatial data sets of Brazil in R and Python
https://ipeagit.github.io/geobr/
789 stars 118 forks source link

Keep geometry data saved as R package data #228

Closed felipeangelimvieira closed 3 years ago

felipeangelimvieira commented 3 years ago

Hello, Thank you for the package, it has been very useful to me.

For the past few weeks, the servers have been unstable and some functions such as read_state aren't working as expected. Isn't it possible to keep those dataframes as binary data in the R package?

Reference: https://r-pkgs.org/data.html?q=data#data-sysdata section 14.2

Thank you

rafapereirabr commented 3 years ago

Hi @felipeangelimvieira . We have aprox 3399 files available in geobr, so it would not be possible to include all of them in the package due to size constraints imposed by CRAN policy. I'm now investigating which data sets are the most popular ones and then I'll check whether we can include a few of them in the package.

I assume the most frequently used data sets are municipalities and estates in 2010, but we'll see.

rafapereirabr commented 3 years ago

@JoaoCarabetta , do you know how this could be done for the Python version?

JoaoCarabetta commented 3 years ago

We can store some data with the package. Then, I just need to tweak the download function to choose the cached data instead.

rafapereirabr commented 3 years ago

We can store some data with the package. Then, I just need to tweak the download function to choose the cached data instead.

good to know. Once I decide on the the data sets we can include in the package I'll post an update here.

rafapereirabr commented 3 years ago

Just for a test, I've saved the municipality data with simplified borders a compressed .rda data and the file is over 9MB. This too large. CRAN policies require that the package is up to 5MB max.

library(geobr)
library(tools)

df <- read_municipality(code_muni = 'all', simplified = T, showProgress = T)
save(df, file = 'munis_2010.rda', compress='xz',compression_level = 9)

checkRdaFiles('.')

>                     size ASCII compress version
> ./munis_2010.rda 9774744 FALSE       xz       3
felipeangelimvieira commented 3 years ago

I see... so relying on the server may be the only option. I don't know the details about the IT infrastructure of IPEA, but using an blob storage such as Azure Blob Storage or Amazon S3 may avoid instability problems.

rafapereirabr commented 3 years ago

We have considered those options, but using Ipea IT infrastructure is cheaper and it gives more speed/felixibility to make any data updates / fixes. Our IT staff have made a few updates recently, so I hope we won't be facing any instabilities any time soon.

rafapereirabr commented 3 years ago

Closing this issue considering these results below

Just for a test, I've saved the municipality data with simplified borders a compressed .rda data and the file is over 9MB. This too large. CRAN policies require that the package is up to 5MB max.

library(geobr)
library(tools)

df <- read_municipality(code_muni = 'all', simplified = T, showProgress = T)
save(df, file = 'munis_2010.rda', compress='xz',compression_level = 9)

checkRdaFiles('.')

>                     size ASCII compress version
> ./munis_2010.rda 9774744 FALSE       xz       3