ropensci-archive / dataone-restful

:no_entry: ARCHIVED :no_entry:
0 stars 0 forks source link

Introduction to the repository #1

Open cboettig opened 10 years ago

cboettig commented 10 years ago

Why create rdataone when we already have the perfectly good dataone R package on CRAN?

rdataone provides a direct implementation of the REST API for dataone member nodes, as documented here: http://mule1.dataone.org/ArchitectureDocs-current/apis/MN_APIs.html, using thin wrappers around the httr curl library, while the dataone package wraps Java libraries instead. The RESTful API is simple, flexible, and lightweight alternative.

Thanks to the intuitive structure of the API and some conveniences of httr curl wrappers, any function in the REST API can be implemented rather quickly and reasonably completely by anyone familiar with basic R functions, without knowledge of java or advanced class structures. These functions primarily add a little syntactic sugar, reasonable defaults/, and native documentation around the raw httr calls to the API.

So far, I've just added a few main use functions as a proof of principle: search, upload, download, and archive, the lattermost being an example of a function not yet in the dataone package. Nothing is stable yet, despite a relatively close mapping to the API there are various finer things to think about how best to simplify without loosing flexibility, such as the handling of the base urls for the different member nodes. For instance, the functions take the node as an optional argument while defaulting to the central node and take the node,

node = c("https://cn.dataone.org/cn/v1",
              "https://knb.ecoinformatics.org/knb/d1/mn/v1/")

while a shorthand reference like "knb" or "test" would probably be preferable to most users (though maintaining the ability to give a full URL for maximum flexibility as well).

Authentication is handled, as in curl calls, by providing a path to the ssl certificate -- no utilities at this time are provided to assist with this. Still, provides a nice sandbox to play with the REST API.

emhart commented 10 years ago

Whoa. When did the REST API go up? Last time I talked to Matt he said there was no rest up and there were issues with httrs and ssl that were preventing the creation of a package.

karthik commented 10 years ago

You read my mind. Was about to email you and ask for the motivation for starting this one. It's still unclear to me how we get around using rJava (or was that just because the CI team wanted to keep most of their code wrapped up in Java) and how we deal with the certificates that need to be renewed every 24 hours.

no utilities at this time are provided to assist with this. If this works, why need any additional utilities?

cboettig commented 10 years ago

@karthik Not sure I follow the question, but we get around rJava by making direct httr calls (GET, POST, PUT, etc) to the member node API. So no rJava dependency. make sense?

Yeah, the certificates are annoying, particularly with the 24 hour expiration. Perhaps we can automate that entirely in R after the first use, since the browser can remember the logon credentials we'd just need to have R check if the current certificate has expired and download a new certificate. Not sure how to do that but maybe Duncan has some ideas. That's the kind of "utility" I had in mind.

sckott commented 10 years ago

Are certificates new every time? I assume so.

emhart commented 10 years ago

I just got the sense when I talked to @mbjones that the certificates would be problematic, and that rCurl couldn't handle them without an update. That a couple months ago so maybe it's changed. I agree that the REST api is way easier than just wrapping java calls with rJava. But are there any plans to change the API to stop requiring SSL certificates every 24 hours?

mbjones commented 10 years ago

The REST API has been up all along (since July 2012 when we went production). Sorry for the confusion. I talked to Karthik and Duncan about using rCurl two years ago at the DataONE all hands meeting, and that was when they told me that rCurl doesn't support client side certificates. That's why we went with wrapping the Java library, as a means to quickly get up and running. I've always thought it would be better to get rid of the Java dependency in the dataone package, and was planning on folding this in when rCurl supported it. Seeing Carl be able to use client side certs in the rCurl has been a revelation to me, and so I think we should move to that now ASAP. I would far prefer if we worked in the original dataone package rather than duplicating, as this will allow us to keep the client functional through incremental releases that replace API calls with rCurl-based versions. And avoid duplicating efforts. Once all of the Java dependencies are gone, we can drop the java libs and rJava requirement, which would be welcome.

So @cboettig, would you be willing to work on this in the original dataone package? As you know that is an SVN repo in the dataone repository, but if you want to move it to GitHub I'd be willing to do that -- I understand github is the flavor du jour for rOpenSci.

karthik commented 10 years ago

I like this plan.

@mbjones Back then rCurl did not support client side certificates but the ci team had already taken the rjava approach (that's when we saw your demo).

cboettig commented 10 years ago

@mbjones definitely agree it would be better to merge into dataone. As I mention above, this repo was very much scratch-pad proof-of-principle.

Happy to merge this into dataone. Does give us some challenges:

Yeah, it's been half a decade since I've used svn and I've gotten pretty used to cheap merging and branching, so it would be easier to develop on Github if that works for you (and Rob Nahf, assuming he is still involved?). @mbjones perhaps you can put dataone up on Github then? While we could just add these REST-based functions I've called d1_upload, d1_get, d1_update, etc to the dataone R/ directory, I'd love to hear input on how to handle the namespace issues I mention above.