ropensci / rentrez

talk with NCBI entrez using R
https://docs.ropensci.org/rentrez
Other
195 stars 38 forks source link

extract MeSH terms and Author Key Words from a PubMed record #134

Closed sbalci closed 5 years ago

sbalci commented 5 years ago

Hi, I would like to extract MeSH terms and Author KeyWords using this package.

The MeSH terms and Author Keywords can be seen when the data is downloaded as xml.

rentrez::entrez_fetch(db="pubmed", id=27591765, rettype = "xml")

But, it seems that extract_from_esummary() and parse_pubmed_xml()does not include information about MeSH terms and Author Keywords. They can be found in xml here: <MeshHeading> and <KeywordList Owner="NOTNLM">.

Something like the following code would be useful:

x <- rentrez::entrez_summary(db="pubmed", id=27591765)
y <- rentrez::extract_from_esummary(esummaries = x, elements = "mesh")

Best regards,

Serdar Balci

dwinter commented 5 years ago

Hi @sbalci ,

I suspect this is a little bit beyond what rentrez can do by itself, but you should be able to write some functoins on top of the rentrez code to get what you are after. In this case, a tabel of MeSH headings from a pmid could be somethign like

library(rentrez)
library(XML)

MeSH_from_pmid <- function(pmid){
   rec <- entrez_fetch(db="pubmed", id=pmid, rettype = "xml", parsed=TRUE)
   m_names <- xpathSApply(rec, "//MeshHeadingList/MeshHeading/DescriptorName", xmlValue)
   m_ui <- xpathSApply(eg_rec, "//MeshHeadingList/MeshHeading/DescriptorName", xmlAttrs)[1,]
   data.frame(mesh_ui = m_ui, descriptor = m_names)
}
 MeSH_from_pmid(27591765)
   mesh_ui                             descriptor
1  D002288               Adenocarcinoma, Mucinous
2  D000328                                  Adult
3  D000368                                   Aged
4  D001650                    Bile Duct Neoplasms
5  D021441           Carcinoma, Pancreatic Ductal
6  D002291                   Carcinoma, Papillary
7  D062506                              Claudin-4
8  D005260                                 Female
9  D015972 Gene Expression Regulation, Neoplastic
10 D006801                                 Humans
11 D008297                                   Male
12 D008875                            Middle Aged
13 D009077                                 Mucins
14 D010190                   Pancreatic Neoplasms

People generaly suggest using the newer xml2 library, which is probably easier to use for these xpath statemetns (but I am old and suck in my ways!). So there may be better ways to go about this, but hope this is some help to you. (Closign for now, but feel free to ask more questions)

sbalci commented 5 years ago

Thank you very much. Similarly I could also get author supplied keywords.

library(rentrez)
library(XML)
Keyword_from_pmid <- function(pmid){
   rec <- entrez_fetch(db="pubmed", id=pmid, rettype = "xml", parsed=TRUE)
   keyword <- xpathSApply(rec, "//KeywordList/Keyword", xmlValue)
   data.frame(keywords = keyword)
}
Keyword_from_pmid(27591765)