Closed gadepallivs closed 9 years ago
Hi david, Is there a way I can uplaod the code or send to your email instead of small expample. Everything is integreated and basically I am unable to understand the occurence of error or at which line of the code it is passing to troubleshoot it.
You could post it as a gist, or if it's really complex it's own github repo?
The traceback() result above is something to do with shiny,rather than a problem with rentrez.
On Fri, Sep 11, 2015 at 10:41 AM, Monty9 notifications@github.com wrote:
Hi david, Is there a way I can uplaod the code or send to your email instead of small expample. Everything is integreated and basically I am unable to understand the occurence of error or at which line of the code it is passing to troubleshoot it.
— Reply to this email directly or view it on GitHub https://github.com/ropensci/rentrez/issues/63#issuecomment-139609969.
David Winter Postdoctoral Research Associate Center for Evolutionary Medicine and Informatics The Biodesign Institute Arizona State University
ph: +1 480 519 5113 w: www.david-winter.info lab: http://cartwrig.ht/lab/ blog: sciblogs.co.nz/the-atavism
Hey @Monty9 -- I'm going to close this. Feel free to open it if you can create a reproducible example
Hi david,
You are correct. The parse_pubmed_xml(data.pubmed)
is resulting in null record for one search. Not sure why, even though PMID : 25905152, has record.
query <- "25905152"
pub.search <- entrez_search(db = "pubmed", term = query, field = "ALL", retmax = 20)
data.pubmed <- entrez_fetch(db = "pubmed", id = pub.search$ids, rettype = "xml")
n <- parse_pubmed_xml(data.pubmed)
I am not sure what could have lead to this. The same code was working well last week. Thank you for directing me to the correct error. Of course I could not solve it yet, I had a clue that it is XML, but did not struck me that it is resulting in empty vector. the above code does result in empty vector. I was totally digging in the wrong place, I was thinking the error is from shiny , traceback() pointed to that.
OK, so ...
That record is a book, and I guess parse_pubmed_xml
thinks everything is a journal article. To be honest, I'm thinking about deleting the parse_pubmed_xml
function, because of exactly these kinds of errors.
Depending on what you want to get out of the pubmed records I'd use entrez_summary
instead.
There is no need to do the search here, you already have a pubmed id, so you could do this
data.pubmed <- entrez_summary(db="pubmed", id=25905152)
If you pass multiple IDs you will get a list of records, you could then use extract_from_esummary
to get data from each one.
Aah, cool. I will rewrite the code using the entrez_summary
and extract_from_esummary
Oh, I should say, while #64 is still open I suggest using always_return_list=TRUE
if you are using entrez_summary
in a case where it gets a variable number of IDs. That way you always safely use extract_from_esummary
This entrez_summary
is easy to extract the data from records. However, I don't see abstract listed under the summary record. When I access attributes, it outputs ::: "has abstract" . Is there a work around for this to extract abstract of an article?
esummary result with 43 items:
[1] uid pubdate epubdate source authors lastauthor
[7] title sorttitle volume issue pages lang
[13] nlmuniqueid issn essn pubtype recordstatus pubstatus
[19] articleids history references attributes pmcrefcount fulljournalname
[25] elocationid viewcount doctype srccontriblist booktitle medium
[31] edition publisherlocation publishername srcdate reportnumber availablefromurl
[37] locationlabel doccontriblist docdate bookname chapter sortpubdate
[43] sortfirstauthor
Thank you
Hmm, looks like you need the full records for the abstract.
data.pubmed <- entrez_fetch(db="pubmed", id=25905152, rettype="xml", parsed=TRUE)
data.pubmed["//AbstractText"]
Check out the vignette section for a tiny bit about using xpath to extract elements from XML files. The other option is just to use xmlToList
and pick the bits you want.
Hi david,
Just curious and trying to understand. The XML output from entrez_fetch
of both journal article and book publication looks similar to me. Is there some specifics I need to look to understand how the parse_pubmed_xml
function recognizes only * "PubmedArticleSet/PubmedArticle"* journal articles but not book type. I see the function has this specific argument passed in your repository. However, could we pass some if/else condition to be able to over come the issue ?
Second, Why is the xmlTreeParse
cannot recognize the entrez_fetch
output ? it throws the same error
Error in as.vector(x, "character") : cannot coerce type 'externalptr' to vector of type 'character'
traceback()
5 as.character.default(x)
4 as.character(x)
3 structure(as.character(x), names = names(x))
2 grep(sprintf("^%s?\\s*<", BOMRegExp), file, perl = TRUE, useBytes = TRUE)
1 xmlTreeParse(data.pubmed, asText = TRUE)
By the way the following worked for me. thank you for the suggestion
pmids <- c("26386083","26273372","26066373","25837167","25466451","25013473","24888229","24348463","24071017","24019382","23927882","23825589","23792568")
data.pubmed <- entrez_fetch(db = "pubmed", id = pmids, rettype = "xml", parsed = TRUE)
abstracts <- xpathApply(data.pubmed, "//Abstract", xmlValue)
names(abstracts) <- pmids
Hi @Monty9,
I suspect we've found the source of the error that stared this thread :smile:
You are trying to re-parse an already parsed record. This works fine...
data.pubmed <- entrez_fetch(db="pubmed", id=25905152, rettype="xml", parsed=FALSE)
x <- XML::xmlTreeParse(data.pubmed)
... this does not
data.pubmed <- entrez_fetch(db="pubmed", id=25905152, rettype="xml", parsed=TRUE)
x <- XML::xmlTreeParse(data.pubmed)
because data.pubmed
is already parsed
class(data.pubmed)
[1] "XMLInternalDocument" "XMLAbstractDocument"
For dealing with the book articles, I recommend writing different functions for book and article records, then use some combination of switch
and if...else
to call them on the appropriate nodes with XML::xmlApply
. We do something similar in entrez_summary.r
if you want an idea of what to do.
noted another issue . When we extract the data using extract_from_esummary
it results in a matrix, where text in some columns creep into other thus resulting irrelevant data w.r.t columns.
pubrecord.table <- extract_from_esummary(esummaries = p.data , elements =
c("uid","title","fulljournalname", "pubtype", "volume", "issue",
"pages", "lastauthor","pmcrefcount", "issn","pubdate" ))
pubrecord.table <- t(pubrecord.table) # To transpose rows to column
write.csv(pubrecord.table , file = "test12.csv"
Hi david. I noted that. Will change and see if I this solves the error. Thankyou
If you can make a reproducable example of the "column creep" please open another issue for it -- though it seems like it's more likely to be problem with the way the csv file is written and read?
Hi david, I will have to work on the reproducible example. Will try to do that. But ,here is the traceback of the current one.