ropensci-archive / rorcid

:warning: ARCHIVED :warning: A programmatic interface the Orcid.org API
Other
109 stars 13 forks source link

Citation format problems #51

Closed gorkang closed 5 years ago

gorkang commented 6 years ago

The citation information seems to have some problems with "non-standard" characters (e.g. " ' ", "(", "&", "é", etc.)

Please, see the following example:

x = orcid_works("0000-0001-8642-6325", put_code = "26222298")
x$`0000-0001-8642-6325`$citation

The citation-value column is:

citation-value
1 @article{Juillerat_2015,doi = {10.4067/s0718-48082015000300006},url = {http://dx.doi.org/10.4067/s0718-48082015000300006},year = 2015,month = {dec},publisher = {{SciELO} Comision Nacional de Investigacion Cientifica Y Tecnologica ({CONICYT})},volume = {33},number = {3},pages = {221--238},author = {Karen L Juillerat and Felipe A Cornejo and Ram{\\'{o}}n D Castillo and Sergio E Chaigneau},title = {Procesamiento sem{\\'{a}}ntico de palabras epist{\\'{e}}micas y metaf{\\'{\\i}}sicas en ni{\\~{n}}os y adolescentes con Trastorno de Espectro Autista ({TEA}) y con Desarrollo T{\\'{\\i}}pico ({DT})},journal = {Terapia psicol{\\'{o}}gica}}

As you can see, the "non-standard" characters are shown between {}, in a weird way. See below the specific problems in this particular reference (plus some others I've found).

shown  -> expected
-------------------
{\\'{o}} -> ó
{\\'{a}} -> á
{\\'{e}} -> é
{\\'{\\i}} -> í
{\\~{n}} -> ñ
({TEA}) -> (TEA)
({DT})} -> (DT)

{\\&} -> &
{\\'a} -> á
{\textquotesingle} -> '

To make it more interesting, there are some inconsistencies (for example, "á" is sometimes codified as "{\'{a}}", and others as "{\'a}").

Below the session's info. (I am using rorcid's dev version):

Session Info ``` Session info --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- setting value version R version 3.4.4 (2018-03-15) system x86_64, linux-gnu ui RStudio (1.1.441) language en_US collate en_US.UTF-8 tz America/Santiago date 2018-04-05 Packages ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- package * version date source assertthat 0.2.0 2017-04-11 CRAN (R 3.4.0) backports 1.1.2 2017-12-13 CRAN (R 3.4.3) base * 3.4.4 2018-03-16 local bib2df * 1.0.0 2017-09-18 CRAN (R 3.4.2) bindr 0.1.1 2018-03-13 CRAN (R 3.4.3) bindrcpp * 0.2 2017-06-17 CRAN (R 3.4.0) bookdown 0.7 2018-02-18 CRAN (R 3.4.3) broom 0.4.3 2017-11-20 CRAN (R 3.4.2) cellranger 1.1.0 2016-07-27 CRAN (R 3.4.0) cli 1.0.0 2017-11-05 CRAN (R 3.4.2) colorspace 1.3-2 2016-12-14 cran (@1.3-2) compiler 3.4.4 2018-03-16 local crayon 1.3.4 2017-09-16 CRAN (R 3.4.1) crul 0.5.2 2018-02-24 CRAN (R 3.4.3) curl 3.1 2017-12-12 CRAN (R 3.4.3) data.table 1.10.4-3 2017-10-27 CRAN (R 3.4.2) datasets * 3.4.4 2018-03-16 local devtools 1.13.5 2018-02-18 CRAN (R 3.4.3) digest 0.6.15 2018-01-28 CRAN (R 3.4.3) dplyr * 0.7.4 2017-09-28 CRAN (R 3.4.1) evaluate 0.10.1 2017-06-24 CRAN (R 3.4.0) forcats * 0.3.0 2018-02-19 CRAN (R 3.4.3) foreign 0.8-69 2017-06-21 CRAN (R 3.4.0) ggplot2 * 2.2.1 2016-12-30 CRAN (R 3.4.4) glue 1.2.0 2017-10-29 CRAN (R 3.4.2) graphics * 3.4.4 2018-03-16 local grDevices * 3.4.4 2018-03-16 local grid 3.4.4 2018-03-16 local gtable 0.2.0 2016-02-26 cran (@0.2.0) haven 1.1.1 2018-01-18 CRAN (R 3.4.3) hms 0.4.2 2018-03-10 CRAN (R 3.4.3) htmltools 0.3.6 2017-04-28 CRAN (R 3.4.0) httr 1.3.1 2017-08-20 CRAN (R 3.4.1) humaniformat 0.6.0 2016-04-24 CRAN (R 3.4.2) jsonlite 1.5 2017-06-01 CRAN (R 3.4.0) knitr 1.20 2018-02-20 CRAN (R 3.4.3) lattice 0.20-35 2017-03-25 CRAN (R 3.3.3) lazyeval 0.2.1 2017-10-29 CRAN (R 3.4.2) lubridate 1.7.3 2018-02-27 CRAN (R 3.4.3) magrittr 1.5 2014-11-22 CRAN (R 3.2.1) memoise 1.1.0 2017-04-21 CRAN (R 3.4.0) methods * 3.4.4 2018-03-16 local mnormt 1.5-5 2016-10-15 CRAN (R 3.4.0) modelr 0.1.1 2017-07-24 CRAN (R 3.4.1) munsell 0.4.3 2016-02-13 cran (@0.4.3) nlme 3.1-131.1 2018-02-16 CRAN (R 3.4.3) openssl 1.0.1 2018-03-03 CRAN (R 3.4.3) pacman * 0.4.6 2017-05-14 CRAN (R 3.4.0) parallel 3.4.4 2018-03-16 local pillar 1.2.1 2018-02-27 CRAN (R 3.4.3) pkgconfig 2.0.1 2017-03-21 cran (@2.0.1) plyr 1.8.4 2016-06-08 cran (@1.8.4) psych 1.7.8 2017-09-09 CRAN (R 3.4.1) purrr * 0.2.4 2017-10-18 CRAN (R 3.4.2) R6 2.2.2 2017-06-17 CRAN (R 3.4.0) Rcpp 0.12.16 2018-03-13 CRAN (R 3.4.3) readr * 1.1.1 2017-05-16 CRAN (R 3.4.0) readxl * 1.0.0 2017-04-18 CRAN (R 3.4.0) reshape2 1.4.3 2017-12-11 CRAN (R 3.4.3) rlang 0.2.0 2018-02-20 CRAN (R 3.4.3) rmarkdown 1.9 2018-03-01 CRAN (R 3.4.3) rorcid * 0.4.0.9444 2018-03-28 Github (ropensci/rorcid@58d1141) rprojroot 1.3-2 2018-01-03 CRAN (R 3.4.3) rstudioapi 0.7 2017-09-07 CRAN (R 3.4.1) rvest 0.3.2 2016-06-17 CRAN (R 3.4.0) scales 0.5.0.9000 2018-02-03 Github (hadley/scales@d767915) stats * 3.4.4 2018-03-16 local stringi 1.1.7 2018-03-12 CRAN (R 3.4.3) stringr * 1.3.0 2018-02-19 CRAN (R 3.4.3) tibble * 1.4.2 2018-01-22 CRAN (R 3.4.3) tictoc 1.0 2014-06-17 CRAN (R 3.4.3) tidyr * 0.8.0 2018-01-29 CRAN (R 3.4.3) tidyverse * 1.2.1 2017-11-14 CRAN (R 3.4.2) tools 3.4.4 2018-03-16 local triebeard 0.3.0 2016-08-04 CRAN (R 3.4.0) urltools 1.7.0 2018-01-20 CRAN (R 3.4.3) utf8 1.1.3 2018-01-03 CRAN (R 3.4.3) utils * 3.4.4 2018-03-16 local withr 2.1.2 2018-03-15 CRAN (R 3.4.3) xfun 0.1 2018-01-22 CRAN (R 3.4.3) xml2 1.2.0 2018-01-24 CRAN (R 3.4.3) yaml 2.1.18 2018-03-08 CRAN (R 3.4.3) ```
sckott commented 6 years ago

thanks @gorkang

I think the problem is on ORCID's end, or at least with their data providers, hard to say where that would go wrong. here's my analysis of the situation.

in R

library(rorcid)
get_json <- function(x) {
  x[[1]]$works$citation$`citation-value`
}
get_xml <- function(x) {
  xml2::xml_text(xml2::xml_find_first(x[[1]]$works, "//work:citation/work:citation-value"))
}

# json
orcid_works(orcid = "0000-0001-8642-6325", put_code = "26222298", format = "application/vnd.citationstyles.csl+json")
get_json(orcid_works(orcid = "0000-0001-8642-6325", put_code = "26222298", format = "application/json"))
get_json(orcid_works(orcid = "0000-0001-8642-6325", put_code = "26222298", format = "application/orcid+json; qs=2"))
get_json(orcid_works(orcid = "0000-0001-8642-6325", put_code = "26222298", format = "application/vnd.orcid+json; qs=4"))

# xml
get_xml(orcid_works(orcid = "0000-0001-8642-6325", put_code = "26222298", format = "application/xml"))
get_xml(orcid_works(orcid = "0000-0001-8642-6325", put_code = "26222298", format = "application/orcid+xml; qs=3"))
get_xml(orcid_works(orcid = "0000-0001-8642-6325", put_code = "26222298", format = "application/vnd.orcid+xml; qs=5"))

with curl on the command line

# json
curl -H 'Authorization: Bearer <your token>' -H "Accept: application/vnd.citationstyles.csl+json" 'https://pub.orcid.org/v2.1/0000-0001-8642-6325/work/26222298' | jq .
curl -H 'Authorization: Bearer <your token>' -H "Accept: application/json" 'https://pub.orcid.org/v2.1/0000-0001-8642-6325/work/26222298' | jq '.citation."citation-value"'
curl -H 'Authorization: Bearer <your token>' -H "Accept: application/orcid+json; qs=2" 'https://pub.orcid.org/v2.1/0000-0001-8642-6325/work/26222298' | jq '.citation."citation-value"'
curl -H 'Authorization: Bearer <your token>' -H "Accept: application/vnd.orcid+json; qs=4" 'https://pub.orcid.org/v2.1/0000-0001-8642-6325/work/26222298' | jq '.citation."citation-value"'

# xml
curl -H 'Authorization: Bearer <your token>' -H "Accept: application/xml" 'https://pub.orcid.org/v2.1/0000-0001-8642-6325/work/26222298' | grep 'work:citation-value'
curl -H 'Authorization: Bearer <your token>' -H "Accept: application/orcid+xml; qs=3" 'https://pub.orcid.org/v2.1/0000-0001-8642-6325/work/26222298'  | grep 'work:citation-value'
curl -H 'Authorization: Bearer <your token>' -H "Accept: application/vnd.orcid+xml; qs=5" 'https://pub.orcid.org/v2.1/0000-0001-8642-6325/work/26222298' | grep 'work:citation-value'

The only format that works (has correctly formatted characters) is application/vnd.citationstyles.csl+json

@rcpeters any thoughts on where the problem may lie? i guess if it's with data providers to ORCID that may be not fixable

gorkang commented 6 years ago

In the meaintime, and in case this is useful to someone, I am "manually" cleaning the affected columns using the function below (I am sure it can be improved):

clean_orcid <- function(df, column_input, column_output = "clean_column") {
  df %>% 
    mutate(temp_name = get(column_input))  %>% 
    mutate(
      temp_name = gsub("\\{", "", temp_name),
      temp_name = gsub("\\}", "", temp_name),
      temp_name = gsub("\\\\", "", temp_name),

      temp_name = gsub("\\'a", "á", temp_name),
      temp_name = gsub("\\'e", "é", temp_name),
      temp_name = gsub("\\'i", "í", temp_name),
      temp_name = gsub("\\'o", "ó", temp_name),
      temp_name = gsub("\\'u", "ú", temp_name),

      temp_name = gsub("\\~n", "ñ", temp_name),
      temp_name = gsub('\\"i', "ï", temp_name),
      temp_name = gsub("\\^e", "é", temp_name),
      temp_name = gsub("\\?\\~", "ç", temp_name),
      temp_name = gsub("\\~a", "ã", temp_name),
      temp_name = gsub('\\"u', "ü", temp_name),
      temp_name = gsub("n\\~([aeiou])", "ñ\\1", temp_name),

      temp_name = gsub("textquotesingle", "'", temp_name),
      temp_name = gsub("textquotedblleft", "'", temp_name),
      temp_name = gsub("textquotedblright", "'", temp_name),
      temp_name = gsub("textquestiondown", "¿", temp_name),
      temp_name = gsub("\\&amp;", "&", temp_name)
    ) %>% 
    mutate(!!column_output := temp_name) %>% 
    select(-temp_name)

}
rcpeters commented 6 years ago

I've used this example in past presentations: curl https://pub.orcid.org/v2.1/0000-0002-0036-9460/works/27038790

In my personal opinion the citation field is nearly useless do to variations in citation formats and sub-formats. Instead PIDs should be provided and resolved which allows a better reflections of the diverse needs of the research community. We are working toward this.

sckott commented 6 years ago

@rcpeters 😆 nice one

By PIDs do you mean using content negotiation of DOIs for the works? But some works won't have DOIs though, right?

We are working toward this.

what's the timeline?

rcpeters commented 6 years ago

Members pushing data are required to provide an external identifier. We are looking at testing requiring users doing self entry to do the same this year. The timeframe for 100% compliance would be in years. But the percentage of works with external identifiers will continuously increase.

sckott commented 6 years ago

thanks for the info

sckott commented 6 years ago

@gorkang in the meantime, can you use format = "application/vnd.citationstyles.csl+json" when you use orcid_works?

e.g.,

orcid_works("0000-0001-8642-6325", put_code = "26222298", format = "application/vnd.citationstyles.csl+json")[[1]]$works
#>               id            type                                                                        author date-parts date-parts    collection-title
#> 1 Juillerat_2015 article-journal Juillerat, Cornejo, Castillo, Chaigneau, Karen L, Felipe A, Ramón D, Sergio E   2015, 12   2015, 12 Terapia psicológica
#>       container-title                             DOI issue number number-of-pages    page page-first
#> 1 Terapia psicológica 10.4067/s0718-48082015000300006     3      3              18 221-238        221
#>                                                                      publisher
#> 1 SciELO Comision Nacional de Investigacion Cientifica Y Tecnologica (CONICYT)
#>                                                                                                                                                      title
#> 1 Procesamiento semántico de palabras epistémicas y metafsicas en niños y adolescentes con Trastorno de Espectro Autista (TEA) y con Desarrollo Tpico (DT)
#>                                                 URL volume
#> 1 http://dx.doi.org/10.4067/s0718-48082015000300006     33
gorkang commented 6 years ago

It seems I can use it, but only with a single put-code. When I send a bunch of put-codes (see code below):

Error: Not Acceptable (HTTP 406)

id = "0000-0001-7678-8656"

# 1. Get put-codes
put_codes = orcid_works(id)[[1]] %>%  
  bind_rows() %>% 
  filter(`publication-date.year.value` >= from_year) %>% # we only ask for the records we need to minimize # of calls.
  pull(`put-code`)

# 2. Get info of those put codes
list_orcid <- orcid_works(id, put_code = put_codes , format = "application/vnd.citationstyles.csl+json")
sckott commented 6 years ago

hi @gorkang looking at this now,

@rcpeters not sure i understand what's going wrong here. looks like accept type application/vnd.citationstyles.csl+json only accepts 1 put code at a time? whereas accept application/json accepts many put codes. Seems like the error message is throwing down 3 different http status codes? 405, 406, and 415?

curl -vL -H 'Authorization: Bearer <token>' -H 'Accept: application/vnd.citationstyles.csl+json' https://pub.orcid.org/v2.1/0000-0001-7678-8656/works/44944196,41661614,43723827,43723548,41661610,41661605,41661603,41661618,41661611,44848794,41661601,41661617,41661613,41661621,41661619,4166161

{"responseCode":406,"developerMessage":"400 Bad Request: There is an issue with your data or the API endpoint. 405 Method Not Allowed: Endpoint and method mismatch. 415 Unsupported Media Type: data must be in XML or JSON format.","userMessage":"ORCID could not process the data, because they were invalid.","errorCode":9001,"moreInfo":"https://members.orcid.org/api/resources/troubleshooting"}

# vs. 
curl -vL -H 'Authorization: Bearer <token>' -H 'Accept: application/json' https://pub.orcid.org/v2.1/0000-0001-7678-8656/works/44944196,41661614,43723827,43723548,41661610,41661605,41661603,41661618,41661611,44848794,41661601,41661617,41661613,41661621,41661619,41661612

200
rcpeters commented 6 years ago

We don't support bulk csl. Not sure we can easily add that. @TomDemeranville might be able clarify.

sckott commented 6 years ago

thanks @rcpeters - i'm a bit lost with this particular error message

TomDemeranville commented 6 years ago

I think this is a case of us supporting multiple putcodes when requesting JSON/XML but not when requesting application/vnd.citationstyles.csl+json

The application/vnd.citationstyles.csl+json does things slightly different to the others. If a work includes a bibtex citation, it cleans it up, adds missing dois/urls and returns that. Otherwise it tries to create a citation using the more limited data we hold. See https://github.com/ORCID/ORCID-Source/blob/149f2dc811613971747d6507d6ae9e41ec9cc0f1/orcid-api-common/src/main/java/org/orcid/api/common/writer/citeproc/WorkToCiteprocTranslator.java

I can only assume that the cleaning fixes the unicode issues in citations provided by publishers, which is why it's working for you when others are not.

I have to say bibtex is horrible in this regard. How unicode/accents are encoded appears to vary from implementation to implementation.

TomDemeranville commented 6 years ago

Also note, please don't rely on the citation being in the work metadata. It's not always present and not always bibtex.

This library https://github.com/ORCID/orcid-js does the following to create a citation. You may want to think about doing something similar:

Here is a JS example that downloads a bibtex file based on above logic: https://github.com/ORCID/orcid-js/blob/master/exampledownloadbibtex.html

sckott commented 6 years ago

thanks for this info @TomDemeranville

thanks for the warning on bibtex.

and thank for the workflow to create a citation - i'll give it a shot

sckott commented 6 years ago

@gorkang can you try a new function that I think addresses this at least in part.

install remotes::install_github("ropensci/rorcid@citations")

see fxn orcid_citations and it's examples

gorkang commented 6 years ago

Hi @sckott

I am doing:

install.packages("remotes")
remotes::install_github("ropensci/rorcid@citations")
DF = rorcid::orcid_citations(orcid = "0000-0001-8642-6325", put_code = "26222298")
DF$citation

The rorcid version is 0.4.2.9115. For some reason I can't see the help (F1): "Error in fetch(key) : lazy-load database '~/R/x86_64-pc-linux-gnu-library/3.4/rorcid/help/rorcid.rdb' is corrupt"

And the end result is not ideal:

"{\"indexed\":{\"date-parts\":[[2018,5,8]],\"date-time\":\"2018-05-08T02:36:58Z\",\"timestamp\":1525747018591},\"reference-count\":0,\"publisher\":\"SciELO Comision Nacional de Investigacion Cientifica Y Tecnologica (CONICYT)\",\"issue\":\"3\",\"content-domain\":{\"domain\":[],\"crossmark-restriction\":false},\"DOI\":\"10.4067\/s0718-48082015000300006\",\"type\":\"article-journal\",\"created\":{\"date-parts\":[[2016,1,28]],\"date-time\":\"2016-01-28T12:24:16Z\",\"timestamp\":1453983856000},\"page\":\"221-238\",\"source\":\"Crossref\",\"is-referenced-by-count\":0,\"title\":\"Procesamiento sem\u00e1ntico de palabras epist\u00e9micas y metaf\u00edsicas en ni\u00f1os y adolescentes con Trastorno de Espectro Autista (TEA) y con Desarrollo T\u00edpico (DT)\",\"prefix\":\"10.4067\",\"volume\":\"33\",\"author\":[{\"given\":\"Karen L\",\"family\":\"Juillerat\",\"sequence\":\"first\",\"affiliation\":[]},{\"given\":\"Felipe A\",\"family\":\"Cornejo\",\"sequence\":\"additional\",\"affiliation\":[]},{\"given\":\"Ram\u00f3n D\",\"family\":\"Castillo\",\"sequence\":\"additional\",\"affiliation\":[]},{\"given\":\"Sergio E\",\"family\":\"Chaigneau\",\"sequence\":\"additional\",\"affiliation\":[]}],\"member\":\"2516\",\"published-online\":{\"date-parts\":[[2015,12]]},\"container-title\":\"Terapia psicol\u00f3gica\",\"original-title\":[],\"language\":\"en\",\"deposited\":{\"date-parts\":[[2016,1,28]],\"date-time\":\"2016-01-28T12:24:16Z\",\"timestamp\":1453983856000},\"score\":1.0,\"subtitle\":[],\"short-title\":[],\"issued\":{\"date-parts\":[[2015,12]]},\"references-count\":0,\"journal-issue\":{\"published-online\":{\"date-parts\":[[2015]]},\"issue\":\"3\"},\"alternative-id\":[\"S0718-48082015000300006\"],\"URL\":\"http:\/\/dx.doi.org\/10.4067\/s0718-48082015000300006\",\"relation\":{},\"ISSN\":[\"0718-4808\"],\"container-title-short\":\"Ter Psicol\"}"

Thanks!

sckott commented 6 years ago

@gorkang the manual file should be fixed now.

what result do you want?

gorkang commented 6 years ago

Thanks @sckott !

OK, Now I can see bibtex is still not suported... that would be one of the things I would expect from orcid_citations().

Also, the accents and special characters are still not showing up. For example:

IS --> SHOULD BE Ram\u00f3n --> Ramón epist\u00e9micas --> epistémicas ni\u00f1os --> niños sem\u00e1ntico --> semántico ...

sckott commented 6 years ago

Well, Crossref can give back bibtex, but ORCID can not. So this fxn could give back bibtex (or other formats) from Crossref, but wouldn't be able to give bibtex for ORCID. So then results would be of mixed type, which isn't great. But I suppose results can be marked by their type to easily filter.

sckott commented 6 years ago

@gorkang reinstall remotes::install_github("ropensci/rorcid@citations") -

changed the fxn to do bibtex for crossref when DOIs avialable, and csl-json still for when only a PUT record found.

can now pass. in format, style, and locale to rcrossref

gorkang commented 6 years ago

Thanks @sckott Just reinstalled and indeed the citation is in Bibtex!

The weird characters issue remains though.

USing the following code I get:

remove.packages("rorcid")
remotes::install_github("ropensci/rorcid@citations")
library(rorcid)

xxx = rorcid::orcid_citations(orcid = "0000-0001-8642-6325", put_code = "26222298", cr_locale = "es_CL")
xxx$citation

[1] "@article{Juillerat_2015,\n\tdoi = {10.4067/s0718-48082015000300006},\n\turl = {https://doi.org/10.4067%2Fs0718-48082015000300006},\n\tyear = 2015,\n\tmonth = {dec},\n\tpublisher = {{SciELO} Comision Nacional de Investigacion Cientifica Y Tecnologica ({CONICYT})},\n\tvolume = {33},\n\tnumber = {3},\n\tpages = {221--238},\n\tauthor = {Karen L Juillerat and Felipe A Cornejo and Ram{\'{o}}n D Castillo and Sergio E Chaigneau},\n\ttitle = {Procesamiento sem{\'{a}}ntico de palabras epist{\'{e}}micas y metaf{\'{\i}}sicas en ni{\~{n}}os y adolescentes con Trastorno de Espectro Autista ({TEA}) y con Desarrollo T{\'{\i}}pico ({DT})},\n\tjournal = {Terapia psicol{\'{o}}gica}\n}"

Where accented letters etc. show up between {}. For example:

{\'{o}} -> ó {\'{a}} -> á

Not sure if it is my mistake (tried different cr_locales without luck).

sckott commented 6 years ago

sorry about that @gorkang I stilll don't know what's going wrong with the encoding. it's on their end, not coming from R itself.

gorkang commented 6 years ago

No worries. I ended up not using the citation because of the issues rcpeters mentioned, plus the encoding madness.

Feel free to close the issue if it bothers you. Otherwise, I am happy to test potential solutions and give feedback.

sckott commented 6 years ago

I'm thinking of hooking into pandoc - so will try to test that today. Do you already have pandoc installed?

sckott commented 6 years ago

hmm, pandoc may not work. how do you feel about a V8 dependency? does this install on your system https://cran.rstudio.com/web/packages/V8/

gorkang commented 6 years ago

V8 installs without problems. My only doubt comes from the package not being actively developed in the last year or so... https://github.com/jeroen/V8/graphs/contributors

sckott commented 6 years ago

Jeroen works with me so I think if there's any problems with the pkg they can be dealt with quickly.

sckott commented 6 years ago

the javascript thing didn't pan out, so we're out of options as far as I can tell.

gorkang commented 6 years ago

Should I close the issue, or do you prefer to leave it open?

sckott commented 6 years ago

I'm working on something new that should in theory fix this: https://github.com/ropensci/handlr/ - will ping you here when i've got something ready to try

sckott commented 6 years ago

@gorkang can you try the example below again after reinstalling remotes::install_github("ropensci/rorcid@citations")

orcid_citations(orcid = "0000-0001-8642-6325", put_code = "26222298", cr_locale = "es_CL")
gorkang commented 6 years ago

Thanks @sckott !

I am finding two issues. First, fetching all the citations in a profile is very slow. For example, with 55 pubs takes ~85 seconds.

tictoc::tic()
df_citations = rorcid::orcid_citations(orcid = "0000-0001-8642-6325", cr_locale = "es_CL")
tictoc::toc()

86.791 sec elapsed


Second, I encountered the following error :

tictoc::tic()
df_citations = rorcid::orcid_citations(orcid = "0000-0001-6758-5101", cr_locale = "es_CL")
tictoc::toc()

Error in cn(dois, ...) : Format 'citeproc-json' for '10.3233/BEN-2012-0352' is not supported by the DOI registration agency: 'medra'. Try one of the following formats: rdf-xml, turtle, citeproc-json-ish, ris, bibtex, bibentry, onix-xml

sckott commented 6 years ago

running that top example, i'm not getting consistently different times between running through the new handlr package and not. I've add a special cr_format value citeproc2bibtex that if given, then asks for citeproc-json from Crossref, then uses handlr to convert to bibtex, Otherwise, we ask for whatever is given in cr_format from Crossref.


note that the bibtex citations coming from handlr may need some tweaking still

sckott commented 6 years ago

the error with medra should go away now cause I had hard-coded the citeproc-json internally, but now that's removed

gorkang commented 6 years ago

It seems there is another error. Running the following:

remotes::install_github("ropensci/rorcid@citations")
tictoc::tic()
df_citations = rorcid::orcid_citations(orcid = "0000-0001-6758-5101", cr_locale = "es_CL")
tictoc::toc()

After a long wait (574s) I get:

Error in data.table::rbindlist(x, use.names = TRUE, fill = TRUE) : 
  Column 2 of item 23 is length 0, inconsistent with first column of that item which is length 1. rbind/rbindlist doesn't recycle as it already expects each item to be a uniform list, data.frame or data.table
In addition: Warning messages:
1: /S0716-97602008000300004 agency not found - proceeding with 'crossref' ... 
2: /S0716-97602008000300004 w/ (404) - <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<title>Error: DOI Not Found</title>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

<link rel="icon" href="/static/img/favicon.png" />
<link rel="shortcut icon" href="/static/favicon.ico" type="image/x-icon" /> 
<link href="/static/style/new-style2.css" rel="stylesheet" type="text/css" />
</head>

<body>

<div style="background:#fcb426">
<img src="/static/img/banner-413.gif" alt="Logo" width="620" height="137" border="0" />
</div>

<div style="height:1px;background:#000000"></div>
<div style="height:1px;background:#54524f"></div>
<div style="height:1px;background:#f6911e"></div>

<!-- TABLE FOR NAVIGATION BAR -->
<table width="100%" border="0" cellpadding="0" cellspacing="0" id="navtable" align="center">
<tr>
    <td width="34" height="26" bgcolor="#231f20"><img src="/static/img/tran [... truncated] 
3: 10.3389/fnagi.2014.00194 agency not found - proceeding with 'crossref' ... 
4: 10.3389/fnagi.2014.00194 w/ (404) - <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<title>Error: DOI Not Found</title>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

<link rel="icon" href="/static/img/favicon.png" />
<link rel="shortcut icon" href="/static/favicon.ico" type="image/x-icon" /> 
<link href="/static/style/new-style2.css" rel="stylesheet" type="text/css" />
</head>

<body>

<div style="background:#fcb426">
<img src="/static/img/banner-413.gif" alt="Logo" width="620" height="137" border="0" />
</div>

<div style="height:1px;background:#000000"></div>
<div style="height:1px;background:#54524f"></div>
<div style="height:1px;background:#f6911e"></div>

<!-- TABLE FOR NAVIGATION BAR -->
<table width="100%" border="0" cellpadding="0" cellspacing="0" id="navtable" align="center">
<tr>
    <td width="34" height="26" bgcolor="#231f20"><img src="/static/img/tran [... truncated] 

574.055 sec elapsed

sckott commented 6 years ago

thanks, i'll have a look

sckott commented 6 years ago

i think most of the added time is that we are calling rcrossref::cr_cn one at a time. rcrossref::cr_cn can do more than one DOI at a time, but going to have to have some very careful processing to make sure all data and metadata stay together in the case of errors

sckott commented 5 years ago

orcid_citations has been merged into master

sckott commented 5 years ago

notes:

for both of these, it looks like the data from ORCID is bad as those are the ID's given for works. either bad from user input or in the automated integrations with partners

@rcpeters is there a way to fix these on ORCID's side?

sckott commented 5 years ago

@gorkang your eg works for me but does throw those warnings, which come from rcrossref, which on a unsuccessful HTTP request returns NULL and throws a warning - but the function overall still should succeed.

sckott commented 5 years ago

closing for now - we can reopen if further changes needed