ropensci / europepmc

R Interface to Europe PMC RESTful Web Service
https://docs.ropensci.org/europepmc
27 stars 8 forks source link

epmc_search returns fewer fields than available in the API #57

Open arvi1000 opened 5 months ago

arvi1000 commented 5 months ago

Thank you for this package, maintainers!

I notice that epmc_search doesn't return some of the useful fields that are available in the API. I think it would would be valuable to return all fields. For example, the API returns both the boolean hasTMAaccessionNumbers but also the accessionType, while the package returns only the former.

Example of different fields returned:

library(europepmc)
library(httr)

# get results for one id from the package and the api
package_result <- epmc_search("PMC10669250")
direct_api_result <-
  GET('https://www.ebi.ac.uk/europepmc/webservices/rest/search?', 
          query = list(query='PMC10669250',
                       resultType='lite',
                       format='json')
      ) |>
  content()

# compare fields returned
package_result |> names()
direct_api_result$resultList$result[[1]] |> unlist() |> names()

from the package:

 [1] "id"                    "source"                "pmcid"                 "title"                 "authorString"          "journalTitle"          "issue"                
 [8] "journalVolume"         "pubYear"               "journalIssn"           "pubType"               "isOpenAccess"          "inEPMC"                "inPMC"                
[15] "hasPDF"                "hasBook"               "hasSuppl"              "citedByCount"          "hasReferences"         "hasTextMinedTerms"     "hasDbCrossReferences" 
[22] "hasLabsLinks"          "hasTMAccessionNumbers" "firstIndexDate"        "firstPublicationDate" 

from the API:

 [1] "id"                                "source"                            "pmcid"                             "fullTextIdList.fullTextId"        
 [5] "title"                             "authorString"                      "journalTitle"                      "issue"                            
 [9] "journalVolume"                     "pubYear"                           "journalIssn"                       "pubType"                          
[13] "isOpenAccess"                      "inEPMC"                            "inPMC"                             "hasPDF"                           
[17] "hasBook"                           "hasSuppl"                          "citedByCount"                      "hasReferences"                    
[21] "hasTextMinedTerms"                 "hasDbCrossReferences"              "hasLabsLinks"                      "hasTMAccessionNumbers"            
[25] "tmAccessionTypeList.accessionType" "firstIndexDate"                    "firstPublicationDate"    
njahn82 commented 5 months ago

Hi @arvi1000, You're right, the default method only returns a subset of Europe PMC data. To access all data, use the raw option. Here's an example parser for your query:

library(europepmc)
library(tidyverse)
my_epmc_data <- epmc_search("PMC10669250", output = "raw")
#> 1 records found, returning 1

tibble::tibble(
  id = map_chr(my_epmc_data, "id"),
  tm_accession_type = map(my_epmc_data, "tmAccessionTypeList") |>
    map_chr("accessionType")
)
#> # A tibble: 1 × 2
#>   id          tm_accession_type
#>   <chr>       <chr>            
#> 1 PMC10669250 chebi

Created on 2024-06-12 with reprex v2.1.0