ropensci / rdatacite

Wrapper to DataCite metadata
https://docs.ropensci.org/rdatacite
Other
25 stars 3 forks source link

dc_search for relatedIdentifier possible? #26

Closed katrinleinweber closed 4 years ago

katrinleinweber commented 4 years ago

Can a web-based search for arelatedIdentifier=$doi be ported to dc_search? I tried these syntaxes:

rdatacite::dc_search("relatedIdentifier%3D%2210.1002%2Fbimj.201700219%22")
rdatacite::dc_search("relatedIdentifier='10.1002%2Fbimj.201700219'")
rdatacite::dc_search("relatedIdentifier:'10.1002%2Fbimj.201700219'")
rdatacite::dc_search("relatedIdentifier='10.1002/bimj.201700219'")
rdatacite::dc_search("relatedIdentifier:[10.1002/bimj.201700219]")
rdatacite::dc_search("relatedIdentifier:10.1002/bimj.201700219")
rdatacite::dc_search('relatedIdentifier:"10.1002/bimj.201700219"')
rdatacite::dc_search('relatedIdentifier%3A%5B10.1002%2Fbimj.201700219%5D')

but am always getting only

# A tibble: 0 x 0

I'm not sure whether I'm missing something, or what else to try. Thanks for any hints here :-)

Session Info ```r - Session info ----------------------------------------------------------------------------------------------------------- setting value version R version 3.6.1 (2019-07-05) os Windows 8.1 x64 system x86_64, mingw32 ui RStudio language en collate German_Germany.1252 ctype German_Germany.1252 tz Europe/Berlin date 2019-10-22 - Packages --------------------------------------------------------------------------------------------------------------- package * version date lib source assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0) backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1) callr 3.3.2 2019-09-22 [1] CRAN (R 3.6.1) cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0) colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0) crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0) crul 0.8.4 2019-08-02 [1] CRAN (R 3.6.1) curl 4.2 2019-09-24 [1] CRAN (R 3.6.1) desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0) devtools * 2.2.1 2019-09-24 [1] CRAN (R 3.6.1) digest 0.6.21 2019-09-20 [1] CRAN (R 3.6.1) dplyr * 0.8.3 2019-07-04 [1] CRAN (R 3.6.1) ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1) fansi 0.4.0 2018-10-05 [1] CRAN (R 3.6.0) fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0) ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.6.1) glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0) gridExtra 2.3 2017-09-09 [1] CRAN (R 3.6.1) gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0) hms 0.5.1 2019-08-23 [1] CRAN (R 3.6.1) httpcode 0.2.0 2016-11-14 [1] CRAN (R 3.6.0) httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.1) inline 0.3.15 2018-05-18 [1] CRAN (R 3.6.1) jsonlite 1.6 2018-12-07 [1] CRAN (R 3.6.0) knitr 1.25 2019-09-18 [1] CRAN (R 3.6.1) lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.0) lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.1) lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.0) magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0) MASS 7.3-51.4 2019-03-31 [2] CRAN (R 3.6.1) memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0) munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0) oai * 0.3.0 2019-09-07 [1] CRAN (R 3.6.1) packrat 0.5.0 2018-11-14 [1] CRAN (R 3.6.1) pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.0) pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.1) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1) pkgdown * 1.4.1 2019-09-15 [1] CRAN (R 3.6.1) pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0) plyr 1.8.4 2016-06-08 [1] CRAN (R 3.6.0) prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0) processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.1) ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0) purrr * 0.3.3 2019-10-18 [1] CRAN (R 3.6.1) R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0) Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.1) rdatacite 0.4.2 2019-05-07 [1] CRAN (R 3.6.1) readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.0) remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.1) rlang 0.4.0 2019-06-25 [1] CRAN (R 3.6.0) rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0) rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.0) scales 1.0.0 2018-08-09 [1] CRAN (R 3.6.0) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0) skimr 1.0.7 2019-06-20 [1] CRAN (R 3.6.1) solrium 1.0.2 2018-12-13 [1] CRAN (R 3.6.1) StanHeaders 2.19.0 2019-09-07 [1] CRAN (R 3.6.1) stringi * 1.4.3 2019-03-12 [1] CRAN (R 3.6.0) stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0) testthat * 2.2.1 2019-07-25 [1] CRAN (R 3.6.1) tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0) tidyr 1.0.0 2019-09-11 [1] CRAN (R 3.6.1) tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.0) triebeard 0.3.0 2016-08-04 [1] CRAN (R 3.6.0) urltools 1.7.3 2019-04-14 [1] CRAN (R 3.6.0) usethis * 1.5.1 2019-07-04 [1] CRAN (R 3.6.1) utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.0) vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.1) ViewPipeSteps 0.1.0 2019-10-09 [1] Github (daranzolin/ViewPipeSteps@0772271) withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0) writexl 1.1 2018-12-02 [1] CRAN (R 3.6.1) xfun 0.10 2019-10-01 [1] CRAN (R 3.6.1) xml2 1.2.2 2019-08-09 [1] CRAN (R 3.6.1) zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0) [1] C:/USERNAME/R/win-library/3.6 [2] C:/Program Files/R/R-3.6.1/library ```
sckott commented 4 years ago

thanks for the issue @katrinleinweber

all inputs have to be named, see examples. this should work

dc_search(q = "relatedIdentifier:\"10.1002/bimj.201700219\"")
katrinleinweber commented 4 years ago

OK, thanks! Those details might be useful to add to #23, or the ReadMe.

Regarding the data in that field however, I’m getting less than from the API directly:

library(magrittr)
download_from_DC <- function(doi) {
  doi %>%
    paste0('relatedIdentifier="', ., '"') %>%
    URLencode(reserved = TRUE) %>%
    paste0("https://api.datacite.org/dois?query=", .) %>%
    jsonlite::fromJSON()
}

download_from_DC("10.1002/bimj.201700219")$data$attributes$types
##    ris  bibtex        citeproc        schemaOrg
## 1 DATA    misc         dataset          Dataset
## 2 JOUR article article-journal ScholarlyArticle
##                  resourceType resourceTypeGeneral
## 1 phenotypic and genetic data             Dataset
## 2              JournalArticle                Text

It seems that rdatacite defaults to only the first row in case a dataframe is returned, and omits the columns ris to schemaOrg:

x <- rdatacite::dc_search(q = "relatedIdentifier:\"10.1002/bimj.201700219\"")
sort(names(x))
##  [1] "_version_"           "allocator"           "allocator_facet"    
##  [4] "allocator_symbol"    "awardNumber"         "checked"            
##  [7] "contributor"         "contributorType"     "created"            
## [10] "creator"             "datacentre"          "datacentre_facet"   
## [13] "datacentre_symbol"   "dataset_id"          "date"               
## [16] "dateType"            "description"         "descriptionType"    
## [19] "doi"                 "format"              "funderIdentifier"   
## [22] "has_media"           "has_metadata"        "indexed"            
## [25] "is_active"           "minted"              "nameIdentifier"     
## [28] "namespace"           "prefix"              "publicationYear"    
## [31] "publisher"           "relatedIdentifier"   "resourceType"       
## [34] "resourceTypeGeneral" "schema_version"      "size"               
## [37] "state"               "subject"             "title"              
## [40] "updated"             "uploaded"            "url"                
## [43] "xml"
x$resourceType
## [1] "phenotypic and genetic data"
x$resourceTypeGeneral
## [1] "Dataset"

Is that intended? My self-made download... functions returns a 2nd related item, which does not seem to be present in dc_search's output.

sckott commented 4 years ago

I'll update docs to clarify


hmm, well your fxn and the rdatacite fxn are using different base URLs, with dc_search using search.datacite.org/api - so maybe there's some differences due to that. e.g., here's what dc_search calls under the hood

curl -v 'https://search.datacite.org/api?q=relatedIdentifier%3A%2210.1002%2Fbimj.201700219%22&wt=json' | jq .

which only returns 1 record and doesn't contain those other fields you are looking for.

So, I think we just need to get #24 done, then this will be sorted

sckott commented 4 years ago

this fxn is now gone in the refactor branch - closing