ropensci / europepmc

R Interface to Europe PMC RESTful Web Service
https://docs.ropensci.org/europepmc
27 stars 8 forks source link

add option to search resulttype=core in epmc_search? #7

Open cstubben opened 8 years ago

cstubben commented 8 years ago

The core results have all the lite fields plus MeSH terms, abstracts and others. Have you considered parsing resulttype=core so users can get MeSH terms for hundreds of articles at once? I have started an XML parser to get some core fields that might help.

njahn82 commented 8 years ago

The europepmc::epmc_details() parses the resulttype=core format. E.g., to get MeSH terms for more than one record, try

lapply(c("25730202", "25891958"), function(x)  europepmc::epmc_details(x)$mesh_topic)
## [[1]]
##   majorTopic_YN                        descriptorName
## 1             N             Chlamydomonas reinhardtii
## 2             N                   Amino Acid Sequence
## 3             N                             Phenotype
## 4             Y                              Mutation
## 5             N       Polymorphism, Single Nucleotide
## 6             N                                Genome
## 7             N                                 Light
## 8             N High-Throughput Nucleotide Sequencing
## 
## [[2]]
##    majorTopic_YN                    descriptorName
## 1              N                          Vacuoles
## 2              N      Plants, Genetically Modified
## 3              N                       Arabidopsis
## 4              N                           Petunia
## 5              N                             Seeds
## 6              N                 Proanthocyanidins
## 7              N      Proton-Translocating ATPases
## 8              N              Arabidopsis Proteins
## 9              N      Genetic Complementation Test
## 10             N Gene Expression Regulation, Plant
## 11             N              Biological Transport
## 12             N                          Mutation
## 13             N         Adenosine Triphosphatases

The function uses the json output because I think it is easier to parse. In addition to MeSH, it returns:

Please let me know if I have missed something. I could try to support "raw" outputs in the upcoming version, so everyone could apply alternative parsers.

I don't want to include the core-format in the epmc_search because this format is very nested and thus hard to parse. It would also require more memory.

cstubben commented 8 years ago

That will work for a few articles, but I'd like MeSH terms for 100s of articles and downloading one at a time will take too long. I'd like the option to get raw output so users can create their own parsers.

Maybe change the id_list option to resulttype and include lite (default), idlist and core, and add a new format option (default is parsed except for core?, but you could return DC, JSON or XML)

epmc_search("title:Waddlia")  # return data.frame
epmc_search("title:Waddlia", resulttype="core", format="xml")
njahn82 commented 8 years ago

Great. Will try to implement it for the upcoming version.

njahn82 commented 7 years ago

There is now an option that returns the core format in list form:

my_list <- epmc_search('Gabi-Kat', output = 'raw', limit = 10)
# display the structure for one list element
str(my_list[[10]])
#> List of 40
#>  $ id                   : chr "27018849"
#>  $ source               : chr "MED"
#>  $ pmid                 : chr "27018849"
#>  $ pmcid                : chr "PMC4883958"
#>  $ doi                  : chr "10.1080/15592324.2016.1161876"
#>  $ title                : chr "Interaction between vitamin B6 metabolism, nitrogen metabolism and autoimmunity."
#>  $ authorString         : chr "Colinas M, Fitzpatrick TB."
#>  $ authorList           :List of 1
#>   ..$ author:List of 2
#>   .. ..$ :List of 6
#>   .. .. ..$ fullName   : chr "Colinas M"
#>   .. .. ..$ firstName  : chr "Maite"
#>   .. .. ..$ lastName   : chr "Colinas"
#>   .. .. ..$ initials   : chr "M"
#>   .. .. ..$ authorId   :List of 2
#>   .. .. .. ..$ type : chr "ORCID"
#>   .. .. .. ..$ value: chr "0000-0001-7053-2983"
#>   .. .. ..$ affiliation: chr "a Department of Botany and Plant Biology , University of Geneva , Geneva , Switzerland."
#>   .. ..$ :List of 5
#>   .. .. ..$ fullName   : chr "Fitzpatrick TB"
#>   .. .. ..$ firstName  : chr "Teresa B"
#>   .. .. ..$ lastName   : chr "Fitzpatrick"
#>   .. .. ..$ initials   : chr "TB"
#>   .. .. ..$ affiliation: chr "a Department of Botany and Plant Biology , University of Geneva , Geneva , Switzerland."
#>  $ authorIdList         :List of 1
#>   ..$ authorId:List of 1
#>   .. ..$ :List of 2
#>   .. .. ..$ type : chr "ORCID"
#>   .. .. ..$ value: chr "0000-0001-7053-2983"
#>  $ journalInfo          :List of 8
#>   ..$ issue               : chr "4"
#>   ..$ volume              : chr "11"
#>   ..$ journalIssueId      : int 2439536
#>   ..$ dateOfPublication   : chr "2016 "
#>   ..$ monthOfPublication  : int 0
#>   ..$ yearOfPublication   : int 2016
#>   ..$ printPublicationDate: chr "2016-01-01"
#>   ..$ journal             :List of 6
#>   .. ..$ title              : chr "Plant signaling & behavior"
#>   .. ..$ medlineAbbreviation: chr "Plant Signal Behav"
#>   .. ..$ isoabbreviation    : chr "Plant Signal Behav"
#>   .. ..$ issn               : chr "1559-2316"
#>   .. ..$ nlmid              : chr "101291431"
#>   .. ..$ essn               : chr "1559-2324"
#>  $ pubYear              : chr "2016"
#>  $ pageInfo             : chr "e1161876"
#>  $ abstractText         : chr "The essential micronutrient vitamin B6 is best known in its enzymatic cofactor form, pyridoxal 5'-phosphate (PLP). However, vit"| __truncated__
#>  $ affiliation          : chr "a Department of Botany and Plant Biology , University of Geneva , Geneva , Switzerland."
#>  $ language             : chr "eng"
#>  $ pubModel             : chr "Print"
#>  $ pubTypeList          :List of 1
#>   ..$ pubType: chr [1:2] "Journal Article" "Research Support, Non-U.S. Gov't"
#>  $ meshHeadingList      :List of 1
#>   ..$ meshHeading:List of 9
#>   .. ..$ :List of 3
#>   .. .. ..$ majorTopic_YN    : chr "N"
#>   .. .. ..$ descriptorName   : chr "Arabidopsis"
#>   .. .. ..$ meshQualifierList:List of 1
#>   .. .. .. ..$ meshQualifier:List of 2
#>   .. .. .. .. ..$ :List of 3
#>   .. .. .. .. .. ..$ abbreviation : chr "GE"
#>   .. .. .. .. .. ..$ qualifierName: chr "genetics"
#>   .. .. .. .. .. ..$ majorTopic_YN: chr "N"
#>   .. .. .. .. ..$ :List of 3
#>   .. .. .. .. .. ..$ abbreviation : chr "IM"
#>   .. .. .. .. .. ..$ qualifierName: chr "immunology"
#>   .. .. .. .. .. ..$ majorTopic_YN: chr "N"
#>   .. ..$ :List of 3
#>   .. .. ..$ majorTopic_YN    : chr "N"
#>   .. .. ..$ descriptorName   : chr "Nitrogen"
#>   .. .. ..$ meshQualifierList:List of 1
#>   .. .. .. ..$ meshQualifier:List of 1
#>   .. .. .. .. ..$ :List of 3
#>   .. .. .. .. .. ..$ abbreviation : chr "ME"
#>   .. .. .. .. .. ..$ qualifierName: chr "metabolism"
#>   .. .. .. .. .. ..$ majorTopic_YN: chr "Y"
#>   .. ..$ :List of 3
#>   .. .. ..$ majorTopic_YN    : chr "N"
#>   .. .. ..$ descriptorName   : chr "Vitamin B 6"
#>   .. .. ..$ meshQualifierList:List of 1
#>   .. .. .. ..$ meshQualifier:List of 1
#>   .. .. .. .. ..$ :List of 3
#>   .. .. .. .. .. ..$ abbreviation : chr "ME"
#>   .. .. .. .. .. ..$ qualifierName: chr "metabolism"
#>   .. .. .. .. .. ..$ majorTopic_YN: chr "Y"
#>   .. ..$ :List of 3
#>   .. .. ..$ majorTopic_YN    : chr "N"
#>   .. .. ..$ descriptorName   : chr "Arabidopsis Proteins"
#>   .. .. ..$ meshQualifierList:List of 1
#>   .. .. .. ..$ meshQualifier:List of 1
#>   .. .. .. .. ..$ :List of 3
#>   .. .. .. .. .. ..$ abbreviation : chr "ME"
#>   .. .. .. .. .. ..$ qualifierName: chr "metabolism"
#>   .. .. .. .. .. ..$ majorTopic_YN: chr "N"
#>   .. ..$ :List of 2
#>   .. .. ..$ majorTopic_YN : chr "N"
#>   .. .. ..$ descriptorName: chr "Temperature"
#>   .. ..$ :List of 2
#>   .. .. ..$ majorTopic_YN : chr "Y"
#>   .. .. ..$ descriptorName: chr "Autoimmunity"
#>   .. ..$ :List of 2
#>   .. .. ..$ majorTopic_YN : chr "N"
#>   .. .. ..$ descriptorName: chr "Gene Expression Regulation, Plant"
#>   .. ..$ :List of 2
#>   .. .. ..$ majorTopic_YN : chr "N"
#>   .. .. ..$ descriptorName: chr "Reproduction"
#>   .. ..$ :List of 2
#>   .. .. ..$ majorTopic_YN : chr "N"
#>   .. .. ..$ descriptorName: chr "Phenotype"
#>  $ keywordList          :List of 1
#>   ..$ keyword: chr [1:8] "Arabidopsis thaliana" "Autoimmunity" "plant defense" "Vitamin B6" ...
#>  $ chemicalList         :List of 1
#>   ..$ chemical:List of 3
#>   .. ..$ :List of 2
#>   .. .. ..$ name          : chr "Arabidopsis Proteins"
#>   .. .. ..$ registryNumber: chr "0"
#>   .. ..$ :List of 2
#>   .. .. ..$ name          : chr "Vitamin B 6"
#>   .. .. ..$ registryNumber: chr "8059-24-3"
#>   .. ..$ :List of 2
#>   .. .. ..$ name          : chr "Nitrogen"
#>   .. .. ..$ registryNumber: chr "N762921K75"
#>  $ subsetList           :List of 1
#>   ..$ subset:List of 1
#>   .. ..$ :List of 2
#>   .. .. ..$ code: chr "IM"
#>   .. .. ..$ name: chr "Index Medicus"
#>  $ fullTextUrlList      :List of 1
#>   ..$ fullTextUrl:List of 3
#>   .. ..$ :List of 5
#>   .. .. ..$ availability    : chr "Free"
#>   .. .. ..$ availabilityCode: chr "F"
#>   .. .. ..$ documentStyle   : chr "pdf"
#>   .. .. ..$ site            : chr "Europe_PMC"
#>   .. .. ..$ url             : chr "http://europepmc.org/articles/PMC4883958?pdf=render"
#>   .. ..$ :List of 5
#>   .. .. ..$ availability    : chr "Free"
#>   .. .. ..$ availabilityCode: chr "F"
#>   .. .. ..$ documentStyle   : chr "html"
#>   .. .. ..$ site            : chr "Europe_PMC"
#>   .. .. ..$ url             : chr "http://europepmc.org/articles/PMC4883958"
#>   .. ..$ :List of 5
#>   .. .. ..$ availability    : chr "Subscription required"
#>   .. .. ..$ availabilityCode: chr "S"
#>   .. .. ..$ documentStyle   : chr "doi"
#>   .. .. ..$ site            : chr "DOI"
#>   .. .. ..$ url             : chr "http://dx.doi.org/10.1080/15592324.2016.1161876"
#>  $ isOpenAccess         : chr "N"
#>  $ inEPMC               : chr "Y"
#>  $ inPMC                : chr "N"
#>  $ hasPDF               : chr "Y"
#>  $ hasBook              : chr "N"
#>  $ hasSuppl             : chr "N"
#>  $ citedByCount         : int 0
#>  $ hasReferences        : chr "Y"
#>  $ hasTextMinedTerms    : chr "Y"
#>  $ hasDbCrossReferences : chr "N"
#>  $ hasLabsLinks         : chr "N"
#>  $ epmcAuthMan          : chr "N"
#>  $ hasTMAccessionNumbers: chr "N"
#>  $ dateOfCompletion     : chr "2016-12-30"
#>  $ dateOfCreation       : chr "2016-05-11"
#>  $ dateOfRevision       : chr "2016-12-31"
#>  $ firstPublicationDate : chr "2016-03-28"
#>  $ embargoDate          : chr "2016-09-28"