Closed kmunger closed 3 years ago
Ok sounds good. I can make a separate repository for replicating the chapter?
And the google corpus looks like it's only 3.7mb ; if we're parsimonious with the rest, that should be fine.
Best would be to make a replication Rmd file for the chapter, and see if that works as a vignette. We can then move that and the data objects on which it depends to a companion package that depends on sophistication, and that will not be on CRAN.
The chapter file reads in the SCOTUS and CR data, which are about half a gig combined. I could considerably downsample these and put them on the main sophistication package, or we could put the larger files in the seperate repository.
OK, that's too big for a package data object, even a non-CRAN one. We can either park them on a server and access them using
load("http://wherethedatais.server.com/bigassdataobkect.Rdata")
or use the new download function in https://github.com/quanteda/quanteda.corpora.
This would make replicating the chapter impossible... Since this is only a CRAN issue, let this sit while I think about the best way to offload them. Also to check that the Google corpus is still not going to break the 5MB limit.