rudeboybert / fivethirtyeight

R package of data and code behind the stories and interactives at FiveThirtyEight
https://fivethirtyeight-r.netlify.app/
Other
454 stars 104 forks source link

Large/numerous datasets are exceeding CRAN pkg size restrictions #82

Closed rudeboybert closed 4 years ago

rudeboybert commented 4 years ago

Given the increase of number and size of datasets, we're hitting CRAN package size restrictions. Our hack workaround so far has been to include only the first 10 rows of the larger datasets (see here for a list of which ones). There are many more in the latest round of data additions (see #79).

Two potential solutions are to either:

rudeboybert commented 4 years ago

@mariumtapal code (slightly modified by @rudeboybert) to test drive functionality on senators data. senators being one of 10 very large datasets listed in the "Note" section here

# Test drive of user experience of installing
#
# -fivethirtyeight: which for 10 very large datasets only includes first 10
# rows. This is to get around CRAN pkg filesize restrictions. 
# -fivethirtyeightdata: which includes the full data for all 10 large datasets above
#
# This is a test involving the senators dataset.

# First, uninstall fivethirtyeight package and restart R.

# Mimic installing fivethirtyeight from CRAN. This will take time since package is still large
remotes::install_github("mariumtapal/fivethirtyeight.test") 

# Load fivethirtyeight
library(fivethirtyeight.test)

# senators not available
View(senators) 
?senators

# Mimic installing fivethirtyeightdata from CRAN. A popup will provide this code
install.packages("fivethirtyeightdata.test", repos = "https://mariumtapal.github.io/drat/", type = "source")

# Load fivethirtyeightdata
library(fivethirtyeightdata.test)

# senators now available
View(senators) 
?senators