ropensci / rentrez

talk with NCBI entrez using R
https://docs.ropensci.org/rentrez
Other
195 stars 38 forks source link

Allow the use of POST #89

Closed reedacartwright closed 7 years ago

reedacartwright commented 7 years ago

NCBI recommends the use of POST if the number of ids is greater than 200. Right now rentrez always uses GET.

reedacartwright commented 7 years ago

Here is a work around I hacked together to upload ids.

epost = function(db, id, config=NULL, ...) {
    uri = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?"
    args = list(..., email="david.winter@gmail.com",tool='rentrez')
    body = list()
    body$id = paste(id, collapse=",")
    body$db = db
    response <- httr::POST(uri, query=args, config=config, body=body, encode="form")
    if(response$status_code >= 400) {
        stop("Posting ids failed.")
    }
    response = httr::content(response, as="text", encoding="UTF-8")
    record <- xmlTreeParse(response, useInternalNodes=TRUE)
    result <- xpathApply(record, "/ePostResult/*", XML::xmlValue)
    names(result) <- c("QueryKey", "WebEnv")
    class(result) <- c("web_history", "list")
    return(result)
}
dwinter commented 7 years ago

Hi @reedacartwright,

Did the advice to use POST come from the NCBI? I was once discouraged from using POST (the http verb) and told to use epost (confusingly via GET). If the policy has changed I can incorporate something like this into the package.

dwinter commented 7 years ago

... actually, just as I say that, it seem POST is going to die (see email from NCBI in #86).

reedacartwright commented 7 years ago

It's in the epost documentation:

UID list. Either a single UID or a comma-delimited list of UIDs may be provided. All of the UIDs must be from the database specified by db. There is no set maximum for the number of UIDs that can be passed to epost, but if more than about 200 UIDs are to be posted, the request should be made using the HTTP POST method.

dwinter commented 7 years ago

Hmmm, will email the help desk and see if they can confirm the right way to do this given the apparent end of support for POST mentioned in the other issue.

cstubben commented 7 years ago

Any updates on this issue? I think entrez_post should definitely use httr::POST! It works fine with https and if NCBI does not allow future posts, then I guess they will drop the EPost utility. And rather than support posting with all utilities, maybe just check efetch, esummary and elink and if they use more than 200 ids, then add a message to use entrez_post instead. I often see hacks like this, and they really should post all the ids at once, then use retstart to get the next 10K records if needed.

dwinter commented 7 years ago

Hi @cstubben ,

I have emailed NCBI and will see what they think. If it something they are happy with then I'm very keen to include, but I also don't want to get offside with the people providing the API by doing something that (in some documents...) seems to be against their rules.

cstubben commented 7 years ago

The docs seem pretty clear to me, "if more than about 200 UIDs are to be provided, the request should be made using the HTTP POST method". I think most rentrez users are violating this and GETting more than 200 ids.

dwinter commented 7 years ago

Hi @reedacartwright and @cstubben ,

The NCBI is going to be supporting POST, so from now on rentrez will use POST for requests with > 200 IDs. Users should not notice a difference (or have to do anything different). I plan to get in on to CRAN this week, but if you want to check it out/test it before hand you can:

 devtools::install_github("ropensci/rentrez", ref="develop")