Closed dan18 closed 8 years ago
It has been a while since I opened this issue, however, I want to share some of my experience with this matter that might help whomever is having the same problem of connecting R with Mongo 3.x. I must admit that I succeeded in installing the 0.1.0 version on Linux, but when I tried to fetch some data from Mongo 3, it still didn't authenticate the connection, and I got frustrated...
How I solved it: I found that Python has a package (PyMongo) that supports Mongo 3.x authentication method. Therefore, I have written a small Python script that included just the connection and the query itself, which is activated by my R script via rPython package. My Python script ultimately received all the data and saved it in an array, then it delivered the resulting JSON vector to my R script to continue with the processing. Perhaps it's bit of a hassle, but that totally solved my issue.
Regards, dan18
Hi dan18,
I have the same problem, could you share the code that you wrote for this Python script?
thanks, Enrico
Hi EnricowithR,
First, you must have Python2.7 installed with packages bson and pymongo. Additionally, R with the package rPython.
Python code is the following:
from pymongo import MongoClient
from bson.json_util import dumps
client = MongoClient('HOST_SERVER_NAME',27017)
client.admin.authenticate(name='USERNAME', password='PASSWORD', mechanism='SCRAM-SHA-1')
db = client.DB_NAME
cursor = db.COLLECTION_NAME.find(...)
docs = [dumps(document) for document in cursor]
R code is the following:
require(rPython)
python.load("PATH_TO_PYTHON_SCRIPT")
docs <- python.get("docs")
What you get is basically a vector of JSON documents. You can transform it into a more convenient format such as dataframe or datatable:
# Transform into dataframe (slow)
require(plyr)
require(jsonlite)
mongo_data <- do.call("rbind.fill",lapply(docs,function(x) as.data.frame(t(fromJSON(x)),stringsAsFactors=FALSE)))
# Transform into datatable (fast)
require(data.table)
require(jsonlite)
mongo_data <- rbindlist(lapply(docs,function(x) as.data.frame(t(fromJSON(x)),stringsAsFactors=FALSE)),fill=TRUE)
Please have in mind that if your result is very big (millions of documents?), then R might throw an error that it cannot allocate such a big vector. Perhaps it has something to do with memory, or with some vector size limits, I haven't explored the reason. Anyways, I bypassed it by running a loop over the resulted Python array, thus creating the final dataframe by smaller partitions. Hope that helped.
Regards, dan18
Hi,
I have some processes written in Rscripts that read data from MongoDB servers, everything worked great on version 2.x, but recently we upgraded Mongo to version 3. From what I saw, this package can deal with version 3, but I have to install the newest version (0.1.0) of this package, which I haven't seen yet on CRAN. The issue is that I get an error when installing with devtools directly from Github. I must say that I've searched the web and found a similar thread on stackoverflow, but there was no answer there. So I don't know, perhaps there is something else I have to do/install/update... or it just doesn't work?
This is what I tried:
Session Info:
Additionally, if it helps, I have the latest Java version installed (Java SE Development Kit 8u60) and rJava version 0.9-7.
I'd be very grateful if someone could help me resolve this issue.
Regards, dan18