ropensci / europepmc

R Interface to Europe PMC RESTful Web Service
https://docs.ropensci.org/europepmc
27 stars 8 forks source link

europepmc::epmc_ftxt() doesn't work? #37

Open ESPoppelaars opened 3 years ago

ESPoppelaars commented 3 years ago

epmc_ftxt() seems not to be able to retrieve full texts anymore. I've tried a bunch of articles that should be openly available according to EuropePMC, and even a bunch that I have successfully retrieved before using this function. Reproducible example:

library(europepmc)
pmid <- "31762656"
literature <- europepmc::epmc_search(pmid)
if (literature[1, "isOpenAccess"] == "Y") {
    result <- europepmc::epmc_ftxt(literature[1, "pmid"])
}

Gives:

Request failed [404]. Retrying in 1.3 seconds...
Request failed [404]. Retrying in 1 seconds...
Error in europepmc::epmc_ftxt(literature[1, "pmid"]) : 
  Not Found (HTTP 404). Failed to retrieve full text..

Trying a bunch of options that have worked in the past:

library(europepmc)
pmid <- c("31762656", "32376948", "32119693")
result <- vector(mode = "list", length = length(pmid))
for (i in 1:length(pmid)) {
    literature <- europepmc::epmc_search(pmid[i])
    result[[i]] <- tryCatch(
        {
            if (any(literature[, "isOpenAccess"] == "Y")) {
                result[[i]] <- europepmc::epmc_ftxt(literature[which(literature[, "isOpenAccess"] == "Y"), 
                                                               "pmid"])
            } else {
                list()
            }
        },
        error = function(request_failed)
        {
            return(list())
        }
    )
}

Gives:

1 records found, returning 1
Request failed [404]. Retrying in 1.4 seconds...
Request failed [404]. Retrying in 2.3 seconds...
1 records found, returning 1
Request failed [404]. Retrying in 1 seconds...
Request failed [404]. Retrying in 3.5 seconds...
1 records found, returning 1
Request failed [404]. Retrying in 1.5 seconds...
Request failed [404]. Retrying in 4 seconds...

europepmc::epmc_search() still works though, as does europepmc::epmc_details().

njahn82 commented 3 years ago

Thank you for this helpful report. Seems like the full text route just accepts PMCIDs.

# a) call pmid
europepmc::epmc_ftxt("31762656")
#> Request failed [404]. Retrying in 1.7 seconds...
#> Request failed [404]. Retrying in 1 seconds...
#> Error in europepmc::epmc_ftxt("31762656"): Not Found (HTTP 404). Failed to retrieve full text..

# b) follow above example, but use pmcid instead of pmid
pmid <- c("31762656", "32376948", "32119693")
result <- vector(mode = "list", length = length(pmid))
for (i in 1:length(pmid)) {
  literature <- europepmc::epmc_search(pmid[i])
  result[[i]] <- tryCatch(
    {
      if (any(literature[, "isOpenAccess"] == "Y")) {
        result[[i]] <- europepmc::epmc_ftxt(literature[which(literature[, "isOpenAccess"] == "Y"), 
                                                       "pmcid"])
      } else {
        list()
      }
    },
    error = function(request_failed)
    {
      return(list())
    }
  )
}
#> 1 records found, returning 1
#> 1 records found, returning 1
#> 1 records found, returning 1
result
#> [[1]]
#> {xml_document}
#> <article article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML">
#> [1] <front>\n  <journal-meta>\n    <journal-id journal-id-type="nlm-ta">Saudi ...
#> [2] <body>\n  <sec id="s0005">\n    <label>1</label>\n    <title>Introduction ...
#> [3] <back>\n  <ref-list id="bi005">\n    <title>References</title>\n    <ref  ...
#> 
#> [[2]]
#> {xml_document}
#> <article article-type="research-article" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
#> [1] <front>\n  <journal-meta>\n    <journal-id journal-id-type="nlm-ta">Sci R ...
#> [2] <body>\n  <sec id="Sec1" sec-type="introduction">\n    <title>Introductio ...
#> [3] <back>\n  <fn-group>\n    <fn>\n      <p><bold>Publisher’s note</bold> Sp ...
#> 
#> [[3]]
#> {xml_document}
#> <article article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML">
#> [1] <front>\n  <journal-meta>\n    <journal-id journal-id-type="nlm-ta">PLoS  ...
#> [2] <body>\n  <sec sec-type="intro" id="sec001">\n    <title>Introduction</ti ...
#> [3] <back>\n  <ref-list>\n    <title>References</title>\n    <ref id="pone.02 ...

Created on 2021-06-03 by the reprex package (v2.0.0)

I'll need to change the documentation accordingly.

ESPoppelaars commented 3 years ago

Ah I see, PMCID works indeed. Good to know! Besides adding it to the documentation, it might also be nice to get an error message if a pmcid isn't used, e.g.:

ext_id <- "31762656"
if (!grepl(pattern = "PMC\\d*", ext_id)) {
    warning("Input needs to be a PMCID.")
}
#> Warning: Input needs to be a PMCID.
ext_id <- "PMC6864198"
if (!grepl(pattern = "PMC\\d*", ext_id)) {
    warning("Input needs to be a PMCID.")
}