ropensci / rentrez

talk with NCBI entrez using R
https://docs.ropensci.org/rentrez
Other
195 stars 38 forks source link

Error in as.vector(x, "character") : cannot coerce type 'externalptr' to vector of type 'character' #63

Closed gadepallivs closed 9 years ago

gadepallivs commented 9 years ago

Hi david, I will have to work on the reproducible example. Will try to do that. But ,here is the traceback of the current one.

traceback()
4: Sys.sleep(0.001)
3: withCallingHandlers(expr, error = function(e) {
       handle <- getOption("shiny.error")
       if (is.function(handle)) 
           handle()
   })
2: shinyCallingHandlers(while (!.globals$stopped) {
       serviceApp()
       Sys.sleep(0.001)
   })
1: shiny::runApp("Sorting_App")
gadepallivs commented 9 years ago

Hi david, Is there a way I can uplaod the code or send to your email instead of small expample. Everything is integreated and basically I am unable to understand the occurence of error or at which line of the code it is passing to troubleshoot it.

dwinter commented 9 years ago

You could post it as a gist, or if it's really complex it's own github repo?

The traceback() result above is something to do with shiny,rather than a problem with rentrez.

On Fri, Sep 11, 2015 at 10:41 AM, Monty9 notifications@github.com wrote:

Hi david, Is there a way I can uplaod the code or send to your email instead of small expample. Everything is integreated and basically I am unable to understand the occurence of error or at which line of the code it is passing to troubleshoot it.

— Reply to this email directly or view it on GitHub https://github.com/ropensci/rentrez/issues/63#issuecomment-139609969.

David Winter Postdoctoral Research Associate Center for Evolutionary Medicine and Informatics The Biodesign Institute Arizona State University

ph: +1 480 519 5113 w: www.david-winter.info lab: http://cartwrig.ht/lab/ blog: sciblogs.co.nz/the-atavism

dwinter commented 9 years ago

Hey @Monty9 -- I'm going to close this. Feel free to open it if you can create a reproducible example

gadepallivs commented 9 years ago

Hi david, You are correct. The parse_pubmed_xml(data.pubmed) is resulting in null record for one search. Not sure why, even though PMID : 25905152, has record.

query <- "25905152"
pub.search <- entrez_search(db = "pubmed", term = query, field = "ALL", retmax = 20)
data.pubmed <- entrez_fetch(db = "pubmed", id = pub.search$ids, rettype = "xml")
n <- parse_pubmed_xml(data.pubmed)

I am not sure what could have lead to this. The same code was working well last week. Thank you for directing me to the correct error. Of course I could not solve it yet, I had a clue that it is XML, but did not struck me that it is resulting in empty vector. the above code does result in empty vector. I was totally digging in the wrong place, I was thinking the error is from shiny , traceback() pointed to that.

dwinter commented 9 years ago

OK, so ...

That record is a book, and I guess parse_pubmed_xml thinks everything is a journal article. To be honest, I'm thinking about deleting the parse_pubmed_xml function, because of exactly these kinds of errors.

Depending on what you want to get out of the pubmed records I'd use entrez_summary instead. There is no need to do the search here, you already have a pubmed id, so you could do this

data.pubmed <- entrez_summary(db="pubmed", id=25905152)

If you pass multiple IDs you will get a list of records, you could then use extract_from_esummary to get data from each one.

gadepallivs commented 9 years ago

Aah, cool. I will rewrite the code using the entrez_summary and extract_from_esummary

dwinter commented 9 years ago

Oh, I should say, while #64 is still open I suggest using always_return_list=TRUE if you are using entrez_summary in a case where it gets a variable number of IDs. That way you always safely use extract_from_esummary

gadepallivs commented 9 years ago

This entrez_summary is easy to extract the data from records. However, I don't see abstract listed under the summary record. When I access attributes, it outputs ::: "has abstract" . Is there a work around for this to extract abstract of an article?

esummary result with 43 items:
 [1] uid               pubdate           epubdate          source            authors           lastauthor       
 [7] title             sorttitle         volume            issue             pages             lang             
[13] nlmuniqueid       issn              essn              pubtype           recordstatus      pubstatus        
[19] articleids        history           references        attributes        pmcrefcount       fulljournalname  
[25] elocationid       viewcount         doctype           srccontriblist    booktitle         medium           
[31] edition           publisherlocation publishername     srcdate           reportnumber      availablefromurl 
[37] locationlabel     doccontriblist    docdate           bookname          chapter           sortpubdate      
[43] sortfirstauthor  

Thank you

dwinter commented 9 years ago

Hmm, looks like you need the full records for the abstract.

 data.pubmed <- entrez_fetch(db="pubmed", id=25905152, rettype="xml", parsed=TRUE)
data.pubmed["//AbstractText"]

Check out the vignette section for a tiny bit about using xpath to extract elements from XML files. The other option is just to use xmlToList and pick the bits you want.

gadepallivs commented 8 years ago

Hi david, Just curious and trying to understand. The XML output from entrez_fetch of both journal article and book publication looks similar to me. Is there some specifics I need to look to understand how the parse_pubmed_xml function recognizes only * "PubmedArticleSet/PubmedArticle"* journal articles but not book type. I see the function has this specific argument passed in your repository. However, could we pass some if/else condition to be able to over come the issue ? Second, Why is the xmlTreeParse cannot recognize the entrez_fetch output ? it throws the same error

Error in as.vector(x, "character") :  cannot coerce type 'externalptr' to vector of type 'character' 
traceback()
5 as.character.default(x) 
4 as.character(x) 
3 structure(as.character(x), names = names(x)) 
2 grep(sprintf("^%s?\\s*<", BOMRegExp), file, perl = TRUE, useBytes = TRUE) 
1 xmlTreeParse(data.pubmed, asText = TRUE) 

By the way the following worked for me. thank you for the suggestion

pmids <- c("26386083","26273372","26066373","25837167","25466451","25013473","24888229","24348463","24071017","24019382","23927882","23825589","23792568")
data.pubmed <- entrez_fetch(db = "pubmed", id = pmids, rettype = "xml", parsed = TRUE)
abstracts <-  xpathApply(data.pubmed, "//Abstract", xmlValue)
names(abstracts) <- pmids
dwinter commented 8 years ago

Hi @Monty9,

I suspect we've found the source of the error that stared this thread :smile:

You are trying to re-parse an already parsed record. This works fine...

data.pubmed <- entrez_fetch(db="pubmed", id=25905152, rettype="xml", parsed=FALSE)
x <- XML::xmlTreeParse(data.pubmed)

... this does not

data.pubmed <- entrez_fetch(db="pubmed", id=25905152, rettype="xml", parsed=TRUE)
x <- XML::xmlTreeParse(data.pubmed)

because data.pubmed is already parsed

class(data.pubmed)
[1] "XMLInternalDocument" "XMLAbstractDocument"

For dealing with the book articles, I recommend writing different functions for book and article records, then use some combination of switch and if...else to call them on the appropriate nodes with XML::xmlApply. We do something similar in entrez_summary.r if you want an idea of what to do.

gadepallivs commented 8 years ago

noted another issue . When we extract the data using extract_from_esummary it results in a matrix, where text in some columns creep into other thus resulting irrelevant data w.r.t columns.

pubrecord.table <- extract_from_esummary(esummaries = p.data , elements = 
c("uid","title","fulljournalname", "pubtype", "volume", "issue", 
"pages", "lastauthor","pmcrefcount", "issn","pubdate" )) 

pubrecord.table <- t(pubrecord.table) # To transpose rows to column
write.csv(pubrecord.table , file = "test12.csv"
gadepallivs commented 8 years ago

Hi david. I noted that. Will change and see if I this solves the error. Thankyou

dwinter commented 8 years ago

If you can make a reproducable example of the "column creep" please open another issue for it -- though it seems like it's more likely to be problem with the way the csv file is written and read?