ropensci / rcrossref

R client for various CrossRef APIs
https://docs.ropensci.org/rcrossref
Other
166 stars 20 forks source link

id_converter() not converting PMIDs correctly #205

Closed Adafede closed 4 years ago

Adafede commented 4 years ago
Session Info ```r R version 4.0.0 (2020-04-24) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.5 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib locale: [1] fr_CH.UTF-8/fr_CH.UTF-8/fr_CH.UTF-8/C/fr_CH.UTF-8/fr_CH.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] zoo_1.8-8 XML_3.99-0.3 webchem_1.0.0 UpSetR_1.4.0 forcats_0.5.0 [6] tidyr_1.1.0 tibble_3.0.1 tidyverse_1.3.0 taxize_0.9.96 stringr_1.4.0 [11] stringi_1.4.6 splitstackshape_1.4.8 rvest_0.3.5 xml2_1.3.2 reticulate_1.16 [16] rentrez_1.2.2 readxl_1.3.1 readr_1.3.1 rcrossref_1.0.0 RColorBrewer_1.1-2 [21] purrr_0.3.4 pbmcapply_1.5.0 jsonlite_1.6.1 igraph_1.2.5 ggraph_2.0.3 [26] eulerr_6.1.0 dplyr_1.0.0 digest_0.6.25 data.table_1.12.8 collapsibleTree_0.1.7 [31] chorddiag_0.1.2 ChemmineR_3.40.0 plotly_4.9.2.1 Hmisc_4.4-0 ggplot2_3.3.1 [36] Formula_1.2-3 survival_3.1-12 lattice_0.20-41 loaded via a namespace (and not attached): [1] colorspace_1.4-1 rjson_0.2.20 ellipsis_0.3.1 htmlTable_1.13.3 fs_1.4.1 base64enc_0.1-3 [7] httpcode_0.3.0 rstudioapi_0.11 farver_2.0.3 urltools_1.7.3 graphlayouts_0.7.0 ggrepel_0.8.2 [13] DT_0.13 lubridate_1.7.8 fansi_0.4.1 codetools_0.2-16 splines_4.0.0 bold_1.0.0 [19] knitr_1.28 polyclip_1.10-0 broom_0.5.6 dbplyr_1.4.4 cluster_2.1.0 png_0.1-7 [25] ggforce_0.3.1 shiny_1.4.0.2 data.tree_0.7.11 compiler_4.0.0 httr_1.4.1 backports_1.1.7 [31] assertthat_0.2.1 Matrix_1.2-18 fastmap_1.0.1 lazyeval_0.2.2 cli_2.0.2 later_1.1.0.1 [37] tweenr_1.0.1 acepack_1.4.1 htmltools_0.4.0 tools_4.0.0 gtable_0.3.0 glue_1.4.1 [43] rsvg_2.1 tinytex_0.23 Rcpp_1.0.4.6 cellranger_1.1.0 vctrs_0.3.1 crul_0.9.0 [49] ape_5.4 nlme_3.1-148 iterators_1.0.12 xfun_0.14 mime_0.9 miniUI_0.1.1.1 [55] lifecycle_0.2.0 MASS_7.3-51.6 scales_1.1.1 tidygraph_1.2.0 hms_0.5.3 promises_1.1.0 [61] curl_4.3 gridExtra_2.3 triebeard_0.3.0 rpart_4.1-15 reshape_0.8.8 latticeExtra_0.6-29 [67] foreach_1.5.0 checkmate_2.0.0 bibtex_0.4.2.2 rlang_0.4.6 pkgconfig_2.0.3 bitops_1.0-6 [73] htmlwidgets_1.5.1 tidyselect_1.1.0 plyr_1.8.6 magrittr_1.5 R6_2.4.1 generics_0.0.2 [79] DBI_1.1.0 haven_2.3.1 pillar_1.4.4 foreign_0.8-80 withr_2.2.0 RCurl_1.98-1.2 [85] nnet_7.3-14 modelr_0.1.8 crayon_1.3.4 viridis_0.5.1 jpeg_0.1-8.1 grid_4.0.0 [91] blob_1.2.1 reprex_0.3.0 xtable_1.8-4 httpuv_1.5.4 munsell_0.5.0 viridisLite_0.3.0 ```

Hi,

Thank you very much for your beautiful package.

I am using your package to retrieve DOIs from various sources. When working with titles, I use your cr_works() function which is great.

However, when working with pubmed IDs, I face following issue:

Some valid pubmed IDs seem not to be recognized.

As an example: 28371833

This is the output I get when using id_converter("28371833", "pmid"):

$status [1] "ok" $responseDate [1] "2020-06-08 02:03:29" $request [1] "tool=rcrossref;email=myrmecocystus%40gmail.com;ids=28371833;idtype=pmid;format=json" $records pmid live status errmsg 1 28371833 false error invalid article id

However, the article id is valid as easily recognized by entrez_summary(db = "pubmed", id = "28371833")[["title"]]

"Cytochrome P450 Monooxygenase CYP716A141 is a Unique β-Amyrin C-16β Oxidase Involved in Triterpenoid Saponin Biosynthesis in Platycodon grandiflorus."

It has nothing to do with the erratum, I checked other entries.

Some other IDs (31708947) work and I could not say why...

If any other infos are needed I am happy to give more details!

sckott commented 4 years ago

thanks for the report, having a look

sckott commented 4 years ago

The API request is here https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?tool=rcrossref&email=myrmecocystus%40gmail.com&ids=28371833&idtype=pmid&format=json which gives the same response. So the problem is on the NCBI end of things. Not sure why they're saying its an invalid article ID.

(related issue https://github.com/ropensci/rcrossref/issues/183 )

open citations corpus (https://github.com/ropenscilabs/citecorp) doesn't have that PMID either:

citecorp::oc_pmid2ids(28371833)
#> data frame with 0 columns and 0 rows
Adafede commented 4 years ago

My bad... sorry for not re-opening there!

Strange from NCBI...but entrez seems to do the job correctly

sckott commented 4 years ago

no worries about opening this issue.

its hard to say why the problem is happening. the API service for id converter may be using some older database or something, there's no clarity on what's going on behind the scenes. You may be better of for ID conversion to us rentrez

maia-sh commented 4 years ago

Hi @sckott and @Adafede,

If I understand correctly id_converter() is built on NLM's ID Converter API which is limited to records in the PMC.

@JimHokanson explains in https://github.com/ropensci/rentrez/issues/136#issuecomment-589332122

As for a workaround, @dwinter's rentrez allows you make the conversion using rentrez::parse_pubmed_xml and rentrez::pubmed_fetch: https://github.com/ropensci/rentrez/issues/136#issuecomment-512060447

But that's a lot of extra data to download for just PMID-DOI conversion (when scaling to many records), so it would be great if there were a simpler converter. Ideally that also works from DOI to PMID (which is what I'm trying to do).

Here are some related links I've come across: https://www.crossref.org/labs/pmid2doi/ https://www.pmid2cite.com/ (promising, but I'm not finding any open source or an API for batch processing) Via their website: https://www.pmid2cite.com/pmid-to-doi-converter https://www.pmid2cite.com/doi-to-pmid-converter

I'd appreciate any further suggestions.

Adafede commented 4 years ago

Hi, I'm not sure this is the right place for your question but anyway, pubmed API does the job perfectly if you just aim at converting DOIs to PM(C)IDS and vice versa.

You can also download locally pubmed conversions table if you really need it to be fast. (you could have a look at https://www.ncbi.nlm.nih.gov/pmc/pmctopmid/)

maia-sh commented 4 years ago

Thanks, @Adafede. Unfortunately, the NLM converter doesn't work for DOIs not available in PMC, similar to the PMID limitation.

For example: 10.1056/NEJMoa1916623 https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?tool=my_tool&email=my_email@example.com&ids=10.1056/NEJMoa1916623

sckott commented 4 years ago

id_converter() is built on NLM's ID Converter API

correct

We used to have a function for that Crossref pmid2doi service, see ?rcrossref-defunct, but we made it defunct, i think it was too unreliable or went down, not sure .

Hadn't seen pmid2cite - agree that it doesn't look like there's any way to programatically use it.

Your example of 10.1056/NEJMoa1916623 might be a case where its so new that there isn't a PMID for it yet, Crossref and Unpaywall have the DOI, but they don't map to other identifiers.

sckott commented 4 years ago

One additional option is Fatcat - see https://api.fatcat.wiki/redoc#operation/lookup_release

for example: https://api.fatcat.wiki/v0/release/lookup?doi=10.1056/NEJMoa1916623

sckott commented 4 years ago

at least I don't think there's anything left to do here

jvargh7 commented 1 year ago

Just in case someone stumbles on this awesome thread, do check out https://www.flickr.com/photos/dullhunk/454160748 that has some advice on this