ropensci / sofa

Easy R interface to CouchDB
https://docs.ropensci.org/sofa/
33 stars 17 forks source link

db_alldocs is severely limited in R sofa pacakge due to use of character vector size limited to 2^31-1 length #64

Closed ghost closed 6 years ago

ghost commented 6 years ago

I have a fairly small Cloudant collection with about 2.6 million documents. I am trying to pull all of them into memory to work with the full data set. This just won't work in R.

Currently, my workaround is to pull it all using Python, dump to a JSON text file, read in R using readLines and do lapply to convert to list using fromJSON. While this works, it is kind of annoying to switch between environments, store data into intermediate files and such.

df <- db_alldocs(myCuishion, 'my-db-collection', include_docs = TRUE)
Error in readBin(self$content, character()) : 
  R character strings are limited to 2^31-1 bytes

Is there any possible way to address this limitation?

sckott commented 6 years ago

i'll have a look

sckott commented 6 years ago

adding ability to write to disk instead of saving within R session

sckott commented 6 years ago

@tumulurig3 try again after reinstalling devtools::install_github("ropensci/sofa") - see disk parameter in db_alldocs and see the new example added for how to use.

does that solve your problem?