tc / RMongo

R client to interface with MongoDB
102 stars 34 forks source link

Can't install RMongo 0.1.0 #38

Closed dan18 closed 8 years ago

dan18 commented 8 years ago

Hi,

I have some processes written in Rscripts that read data from MongoDB servers, everything worked great on version 2.x, but recently we upgraded Mongo to version 3. From what I saw, this package can deal with version 3, but I have to install the newest version (0.1.0) of this package, which I haven't seen yet on CRAN. The issue is that I get an error when installing with devtools directly from Github. I must say that I've searched the web and found a similar thread on stackoverflow, but there was no answer there. So I don't know, perhaps there is something else I have to do/install/update... or it just doesn't work?

This is what I tried:

> install_github("tc/RMongo")
Downloading GitHub repo tc/RMongo@master
Installing RMongo
"C:/PROGRA~1/R/R-31~1.2/bin/x64/R" --no-site-file --no-environ --no-save --no-restore CMD INSTALL  \
  "C:/Users/XX/AppData/Local/Temp/Rtmp2vwmW4/devtools3849eb7fc/tc-RMongo-e65a0cf" --library="C:/Users/XX/Documents/R/win-library/3.1" --install-tests 

* installing *source* package 'RMongo' ...
** libs
no DLL was created
ERROR: compilation failed for package 'RMongo'
* removing 'C:/Users/XX/Documents/R/win-library/3.1/RMongo'
* restoring previous 'C:/Users/XX/Documents/R/win-library/3.1/RMongo'
Error: Command failed (1)

Session Info:

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] devtools_1.9.1

loaded via a namespace (and not attached):
[1] bitops_1.0-6   digest_0.6.8   httr_0.6.1     magrittr_1.5   memoise_0.2.1  RCurl_1.95-4.7 stringi_0.5-5  stringr_1.0.0  tools_3.1.2

Additionally, if it helps, I have the latest Java version installed (Java SE Development Kit 8u60) and rJava version 0.9-7.

I'd be very grateful if someone could help me resolve this issue.

Regards, dan18

dan18 commented 8 years ago

It has been a while since I opened this issue, however, I want to share some of my experience with this matter that might help whomever is having the same problem of connecting R with Mongo 3.x. I must admit that I succeeded in installing the 0.1.0 version on Linux, but when I tried to fetch some data from Mongo 3, it still didn't authenticate the connection, and I got frustrated...

How I solved it: I found that Python has a package (PyMongo) that supports Mongo 3.x authentication method. Therefore, I have written a small Python script that included just the connection and the query itself, which is activated by my R script via rPython package. My Python script ultimately received all the data and saved it in an array, then it delivered the resulting JSON vector to my R script to continue with the processing. Perhaps it's bit of a hassle, but that totally solved my issue.

Regards, dan18

EnricowithR commented 7 years ago

Hi dan18,

I have the same problem, could you share the code that you wrote for this Python script?

thanks, Enrico

dan18 commented 7 years ago

Hi EnricowithR,

First, you must have Python2.7 installed with packages bson and pymongo. Additionally, R with the package rPython.

Python code is the following:

from pymongo import MongoClient
from bson.json_util import dumps

client = MongoClient('HOST_SERVER_NAME',27017)
client.admin.authenticate(name='USERNAME', password='PASSWORD', mechanism='SCRAM-SHA-1')
db = client.DB_NAME
cursor = db.COLLECTION_NAME.find(...)
docs = [dumps(document) for document in cursor]

R code is the following:

require(rPython)
python.load("PATH_TO_PYTHON_SCRIPT")
docs <- python.get("docs")

What you get is basically a vector of JSON documents. You can transform it into a more convenient format such as dataframe or datatable:

# Transform into dataframe (slow)
require(plyr)
require(jsonlite)
mongo_data <- do.call("rbind.fill",lapply(docs,function(x) as.data.frame(t(fromJSON(x)),stringsAsFactors=FALSE)))

# Transform into datatable (fast)
require(data.table)
require(jsonlite)
mongo_data <- rbindlist(lapply(docs,function(x) as.data.frame(t(fromJSON(x)),stringsAsFactors=FALSE)),fill=TRUE)

Please have in mind that if your result is very big (millions of documents?), then R might throw an error that it cannot allocate such a big vector. Perhaps it has something to do with memory, or with some vector size limits, I haven't explored the reason. Anyways, I bypassed it by running a loop over the resulted Python array, thus creating the final dataframe by smaller partitions. Hope that helped.

Regards, dan18