Closed gadepallivs closed 7 years ago
Hi Monty,
I'll have a look and see if there is anything rentrez can do in cases like this. In the mean time, yes, try and chunk very large requests into several smaller ones. retmax
and retstart
are the arguments to use, there is an example in teh vigenette.
Hi David,
I passed my query to the function you created in #70. It still throws me the same error in as.vector: cannot coerce type 'externalptr' to vector of type 'character'
Hi Monty, I'm afraid I can't reproduce the error. Can you tell me what you get when you don't set parsed=TRUE
?
Hi David,
I tried that, I get the same error after repeated attempts irrespective of Parse = T or F
. Were you able to reproduce the 2nd error with entrez_summary()
in the question above ? Error: parse error: premature EOF (right here) ------^
. It used to work fine for me as well, not sure what is wrong in my query now. I have updated R to 3.2.3, not sure if it matters my code works for smaller queries but breaks for large ones.
query = "BRAF[Title/Abstract] OR melanoma[Title/Abstract] AND Cancer[Title]
AND (2000[PDAT] :2015[PDAT])"
fetch_and_parse <- function(start) {
cat(start,"\r") #let the user now where
pubmed_records <-
entrez_fetch(
db = "pubmed", web_history = pubmed_search$web_history,
retstart = start, retmax = 1000, rettype =
"xml"
)
parse_pubmed_xml(pubmed_records)
}
pubmed_search <-
entrez_search(db = "pubmed", term = query, use_history = TRUE)
pubmed_parsed <- lapply(pubmed_search, fetch_and_parse)
dput(pubmed_search)
Hi Monty,
The error message for entrez_fetch is from the XML package, so I don't think you should be getting if parse is set to FALSE.
If you can reliably get this error we might be able to get to the bottom of it . Can you run this following code? If everythng goes fine er
will be NULL. If it goes wrong er
will be the raw file that is messing everything up.
query = "BRAF[Title/Abstract] OR melanoma[Title/Abstract] AND Cancer[Title]
AND (2000[PDAT] :2015[PDAT])"
pubmed_search <- entrez_search(db = "pubmed", term = query,
use_history = TRUE)
did_it_parse <- function(recs){
flag <- tryCatch(
XML::xmlTreeParse(recs, useInternalNodes=TRUE),
error = function(e) "FAIL"
)
if(typeof(flag) == "character"){
return(FALSE)
}
TRUE
}
trap_error <- function(){
res <- rentrez:::make_entrez_query(
"efetch", config=NULL,
WebEnv=pubmed_search$web_history$WebEnv,
query_key=pubmed_search$web_history$QueryKey,
rettype="xml",
db="pubmed", retmax=1000)
cat("res is a '", typeof(res), "'\n")
if(did_it_parse(res)){
return(invisible())
}
res
}
er <- trap_error()
For making progress on your own work. You just need to use fewer records ata time (change retmax to suit) so these large files don't cause these errors.
Hi david, Thank you for your time. The er
is NULL..
I am posting a traceback()
of the errors I get in the question. I will work on troubleshooting and see if I can provide you a repeatable error. Below is the traceback()
. Just in case it may help to pinpoint error. isn 12 Stop("HTTP failure is expected ?. Could it be the time it takes to fetch pubmed for large list of pmids is causing time out of the connection and thus throwing error ?
Error in as.vector(x, "character") :
cannot coerce type 'externalptr' to vector of type 'character'
16 as.character.default(X[[i]], ...)
15 FUN(X[[i]], ...)
14 lapply(list(...), as.character)
13 .makeMessage(..., domain = domain)
12 stop("HTTP failure: ", req$status_code, "\n", message, call. = FALSE)
11 entrez_check(response)
10 (function (util, config, interface = ".fcgi?", by_id = FALSE,
...)
{
uri <- paste0("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/", ...
9 do.call(make_entrez_query, args)
8 entrez_fetch(db = "pubmed", web_history = pubmed_search$web_history,
retstart = start, retmax = 1000, rettype = "xml") at .active-rstudio-document#6
7 FUN(X[[i]], ...) at .active-rstudio-document#4
6 lapply(pubmed_search, fetch_and_parse) at .active-rstudio-document#16
5 eval(expr, envir, enclos)
4 eval(ei, envir)
3 withVisible(eval(ei, envir))
2 source("~/.active-rstudio-document")
1 source("~/.active-rstudio-document")
Thanks Monty, this is helpful.
I'm not sure why you are doing lapply(pubmed_search, fetch_and_parse)
? fetch_and_parse takes a starting number as argument, not a pubmed search.
Also, it looks like the error occurs when rentrez tries to deal with the error message. Could you install a tweaked version:
devtools::install_github("rentrez", "ropensci", ref="monty")
Presuming these errors persist, let me know what error messages you now get form entrez_fetch
Hi David,
fetch_and_parse
function input. So, if I have 7000 hits. To parse all the data, does it require me to run this function in a loop over 7 start numbers ( 2, 1002, 2002, and so on.)entrez_search()
,entrez_fetch()
, entrez_summary()
as part of the Rshiny application that I shared with you via email. (1) As part of the Rshiny application the code breaks at entrez_fetch()
. (2) However, if i run the above functions individually to test them separately entrez_fetch()
works depending on how large the number of hits are, but now the program breaks at entrez_summary()
. In both cases, the error is cannot coerce type 'externalptr' to vector of type 'character'
and occasionally entrez_summary()
also throws Error: parse error: premature EOF ..
entrez_fetch()
with the error HTTP failure: 400
. However, when I take these functions out separately and run them individually entrez_fetch()
works , but the entrez_summary()
consistently breaks with error Error: HTTP failure: 502
.
traceback()
5:stop("HTTP failure: ", req$status_code, call. = FALSE)
4:entrez_check(response)
3:(function (util, config, interface = ".fcgi?", by_id = FALSE, ...)
{
uri <- paste0("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/",
util, interface)
args <- list(..., email = entrez_email(), tool = entrez_tool())
if (by_id) {
ids_string <- paste0("&id=", args$id, collapse = "")
args$id <- NULL
uri <- paste0(uri, ids_string)
}
else {
if ("id" %in% names(args)) {
args$id <- paste(args$id, collapse = ",")
}
}
response <- httr::GET(uri, query = args, config = config)
entrez_check(response)
return(httr::content(response, as = "text"))
})(
"esummary", db = "pubmed", config = NULL, retmode = "json",
version = "2.0",
WebEnv =
"NCID_1_4959508_130.14.22.215_9001_1453991749_1393371199_0MetA0_S_MegaStore_F_1",
query_key = "1"
)
2:do.call(make_entrez_query, args)
1:entrez_summary(
db = "pubmed", version = "2.0", web_history = pubmed_search$web_history,
always_return_list = TRUE
)
Hi david,
Today morning when I tried running my code, it did not throw me errors when hits are 100-150 . Earlier, anything more than 50 hits used to throw errors. However, anything more than 200 hits still consistently breaks at entrez_summary()
with Error: HTTP failure: 502
Hi @Monty9 ,
Sorry I don't have much time to work on these present. All I can tell you is that "502" is an internal server error (computers on the NCBI side not talking to each other as expected).
In general, it's a good idea "chunk" large requests into smaller subsets. The NCBI does seem to get flakey at times (they suggest only doing large jobs at "off peak" (USA) times). There's not much that rentrez
can do about that. But I'll see if we can capture the errors or provide useful documentation about these problems. You might consider something similar for your web app (i.e. providing users with informative messages when you run into these errors).
Hi David, Thank you for the response. I will incorporate your suggestion. Do I need to install the tweaked version every time ? My understanding is after you suggested me to install the tweaked version I see these specific errors 502 and 400. How long the tweaked version is valid ?
Hi monty.
I will probably improve the error handling in the main version of rentrez in the next week or so, at least to always return text errors. I'll let you know when it's done.
On Mon, Feb 1, 2016 at 12:26 PM, Monty9 notifications@github.com wrote:
Hi David, Thank you for the response. I will incorporate your suggestion. Do I need to install the tweaked version every time ? My understanding is after you suggested me to install the tweaked version I see these specific errors 502 and 400. How long the tweaked version is valid ?
— Reply to this email directly or view it on GitHub https://github.com/ropensci/rentrez/issues/74#issuecomment-178140915.
David Winter Postdoctoral Research Associate Center for Evolutionary Medicine and Informatics The Biodesign Institute Arizona State University
ph: +1 480 519 5113 w: www.david-winter.info lab: http://cartwrig.ht/lab/ blog: sciblogs.co.nz/the-atavism
Hi,
first many thanks for this package, it's very usefull. I've try the dev version of rentrez (1.0.1) with R version 3.2.3 (2015-12-10) and I want to use _entrezsummary() with the _webhistory option. In my session, just the version "1.0" works fine.
es <- entrez_search(db = "pubmed", query, use_history = TRUE)
> esum_1 <- entrez_summary(db="pubmed", web_history = es$web_history,version="1.0")
> esum_1
List of 5161 esummary records. First record:
The version "2.0" leads to a parse error message
> esum_2 <- entrez_summary(db="pubmed", web_history = es$web_history,version="2.0")
Erreur : parse error: premature EOF
(right here) ------^
or a Erreur : HTTP failure: 500
Erreur : HTTP failure: 500
<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html>
<head>
<title>NCBI/eutils211 - WWW Error 500 Diagnostic</title>
<style type="text/css"><![CDATA[
h1.error {color: red; font-size: 40pt}
div.diags {text-indent: 0.5in }
]]></style>
</head>
<body>
<h1>Server Error</h1>
<p>Your request could not be processed due to a problem on
our Web server. This could be a transient problem, please
try the query again. If it doesn't clear up within a
reasonable period of time, e-mail a short description of your
query and the diagnostic information shown below to:</p>
<p>
pubmed@nlm.nih.gov - for problems with PubMed<br/>
webadmin@ncbi.nlm.nih.gov - for problems with other services<br/>
</p>
<p>Thank you for your assistance. We will try to fix the
problem as soon as possible.
</p>
<hr/>
<p>
Diagnostic Information:</p>
<div class="diags">Error: 500</div>
I don't know how to deal with that error. Any suggestions ?
Bonjour @arnome, and thanks for this report.
These "transient" errors are most likely to happen with large (> a few hundred) requests. It's probably a good idea to "chunk" these into smaller requests. With the web_history approach that means using retstart
and retmax
, someting similar to the last chunk in this vignette example
Hi David, thank's for your reply, it works great now. I leave a function I've wrote directly inspire by the vignette
# Function : full_entrez_summary(db,es,step)
# in : db : db name to search with, es entrez_search object, step : the step
# out : a esummary_list of esummary
# ex : summaries <- full_entrez_summary("pubmed",es,50)
full_entrez_summary <- function(db,es,step)
{
# get partial summaries step by step
for(i in seq(0,es$count,step)){
esum <- entrez_summary(db=db, web_history = es$web_history, version = "2.0", always_return_list = TRUE,retstart=i, retmax = step)
# if i not the first step append
if (i != 0){
esum_t <- append(esum_t,esum)
# if i is the first just memorize
}else{
esum_t <- esum
}
}
# reattribute the right class (esummary_list) lost with append()
class(esum_t) <- c("esummary_list", "list")
return(esum_t)
}
be seeing you, arnome.
Cheers @arnome,
Could idea with re-adding the class. I have to include a note about this in the vignette!
Hey @Monty9, the new master branch should handle http error codes smoothly. Do you want to check it out?
Hi david, Sure. I would like to try that. Do I need to run any update ?
Hi @Monty9,
Yeah, the new release is on CRAN now, sow any of git pull
and local install, devtools::install_github
and install.packages()
should work :)
I get the following error for my code that has been working perfect for a while now.
Error in as.vector: cannot coerce type 'externalptr' to vector of type 'character'
. We discussed this error earlier and the cause of it was empty records . I fixed it with anif
condition to overcome blank records issue. However, this time it is unpredictable. As of now, when I repeatedly run the code with new search query every time ( and for hits >100) it breaks at this line belowentrez_fetch()
. What I noticed is when I go back and re-run just that part of the code, it runs successfully (it is also subjective to number of hits, time it takes to run entrez_fetch() function etc). Not sure, if you can reproduce it or you came across this error. Will too many hits to a search query cause overload onentrez_fetch()
? Please find the code below.Note: If you change the OR to AND in the query, it works fine. Also, the number of hits now will be 117. Also, I have read your solution on #70. parsing large queries, I am working to implement if I have to do that, I guess I need to parse the
XML
individually for all the elements thatextract_from_esummary
will extract for me in a single function.