Closed gorkang closed 5 years ago
thanks for the question @gorkang
@rcpeters another question for you. Seems like journal name is shown in the ORCID UI for peer reviews. but I can't seem to find it either in the API response. Any guidance?
For peer reviews, the journal name (or publisher name, or organisation, etc) is going to be found in the group data.
https://pub.orcid.org/v2.1/0000-0001-7678-8656/peer-review/220419 ->
thanks @alainna for that, that's what we needed review-group-id
, sorry i missed that
id = "0000-0001-7678-8656"
x = orcid_peer_reviews(id, put_code = "220419")[[1]]
rcrossref::cr_journals(strsplit(x$`review-group-id`, ":")[[1]][[2]])$data$title
#> [1] "Journal of Experimental Psychology General"
Just a note not all group IDs are required to be ISSNs.
select split_part(group_id,':',1) as prefix, count(*) from group_id_record group by prefix;
prefix | count
-----------------+-------
publons | 1297
orcid-generated | 100
ringgold | 1
issn | 13181
(4 rows)
Thanks @rcpeters - well i guess we can try to detect if it's an ISSN, and if so, we can try to grab the journal name
@gorkang does this solultion https://github.com/ropensci/rorcid/issues/52#issuecomment-379971887 work for you? I don't think we want to integrate rcrossref
into this pkg, but we could document how to work with it to get publication title names. Thoughts?
Thanks @sckott for checking back on this.
Yes, looking for the journal name using issn works, although it is very slow, so it adds ~15s for each researcher I have (see code below).
get_orcid_reviews <- function(id) {
# id = "0000-0001-7678-8656" #
library(pacman)
p_load(dplyr, rorcid)
tictoc::tic()
# Get reviews ---------------------------------------------------------------
temp_reviews = orcid_peer_reviews(id)[[1]]$group$`peer-review-summary` %>%
bind_rows()
years_reviews = temp_reviews %>%
# filter(`completion-date.year.value` >= from_year) %>% # we only ask for the records we need to minimize # of calls.
pull(`completion-date.year.value`) #`put-code`
# Get journal titles ------------------------------------------------------
# Get put-codes
put_codes = temp_reviews %>% pull(`put-code`)
# Get details for reviews using put-codes
list_orcid_reviews <- orcid_peer_reviews(id, put_code = put_codes)
# Get issn
issn_reviews = 1:length(list_orcid_reviews) %>% purrr::map(~strsplit(list_orcid_reviews[[.x]]$`review-group-id`, ":")[[1]][[2]]) %>% unlist()
# Get journal name using issn
journal_names = rcrossref::cr_journals(issn_reviews)$data$title
# Tidy data ---------------------------------------------------------------
df_reviews = years_reviews %>% as_tibble() %>%
mutate(orcid_id = id) %>%
left_join(df_orcid_names, by = "orcid_id") %>%
rename(year = value) %>%
mutate(journal_name = journal_names) %>%
select(-other_names)
tictoc::toc()
df_reviews
}
get_orcid_reviews( id = "0000-0001-7678-8656")
Taking those extra 15s for each researcher feels particularly wasteful as the the journal name is in the ORCID website (but for some reason not in the ORCID data):
Any idea to make it faster would be greatly appreciated.
Thanks!
@gorkang just took another look at this.
i can't replicate your function above because the object df_orcid_names
is missing, but I think i have a solution.
I just added a dataset of issn's and journal titles gathered from crossref, i need to work out a process for updating it, or letting users do so, but is much faster. e..,g,
system.time({
id = "0000-0001-7678-8656"
x = orcid_peer_reviews(id, put_code = "220419")[[1]]
issn <- strsplit(x$`review-group-id`, ":")[[1]][[2]]
rcrossref::cr_journals(issn)$data$title
})
user system elapsed
0.071 0.003 0.774
system.time({
id = "0000-0001-7678-8656"
x = orcid_peer_reviews(id, put_code = "220419")[[1]]
issn <- strsplit(x$`review-group-id`, ":")[[1]][[2]]
issn_title[[issn]]
})
user system elapsed
0.010 0.001 0.102
Thanks @sckott for taking another look at this.
The new method does work better, but fails when the issn is not in issn_title.rda (btw, I had to download it manually. Maybe it does not load with the package?)
So, to solve the first point, I created a function to get the title with the best available method:
get_title_from_issn <- function(issn) {
load("issn_title.rda") # CHANGE PATH AS NEEDED
tryCatch(issn_title[[issn]], error = function(e) {rcrossref::cr_journals(issn)$data$title})
}
journal_names = issn_reviews %>% purrr::map( ~ get_title_from_issn(.x)) %>% unlist()
In the specific case I am trying, there are 6 out of 20 issn not present in issn_title.rda. The time it takes goes down from ~31 to ~17 seconds.
Please, see the full code below. I adapted the get_orcid_reviews() function so you can select the "method" (new or old). Sorry for leaving df_orcid_names in the previous code. Now it should work.
get_orcid_reviews <- function(id, method = "new") {
library(pacman)
p_load(dplyr, rorcid)
tictoc::tic()
# Get reviews ---------------------------------------------------------------
temp_reviews = orcid_peer_reviews(id)[[1]]$group$`peer-review-summary` %>%
bind_rows()
years_reviews = temp_reviews %>%
# filter(`completion-date.year.value` >= from_year) %>% # we only ask for the records we need to minimize # of calls.
pull(`completion-date.year.value`) #`put-code`
# Get journal titles ------------------------------------------------------
# Get put-codes
put_codes = temp_reviews %>% pull(`put-code`)
# Get details for reviews using put-codes
list_orcid_reviews <- orcid_peer_reviews(id, put_code = put_codes)
# Get issn
issn_reviews = 1:length(list_orcid_reviews) %>% purrr::map(~strsplit(list_orcid_reviews[[.x]]$`review-group-id`, ":")[[1]][[2]]) %>% unlist()
# GET JOURNAL NAMES -------------------
# METHOD A (slow) Get journal name using issn
if (method == "old") {
journal_names = rcrossref::cr_journals(issn_reviews)$data$title
# METHOD B (new) Get journal name using issn
} else if (method == "new") {
get_title_from_issn <- function(issn) {
load("dev/BUGS/BUG - reviews slow/issn_title.rda")
tryCatch(issn_title[[issn]], error = function(e) {rcrossref::cr_journals(issn)$data$title})
}
journal_names = issn_reviews %>% purrr::map( ~ get_title_from_issn(.x)) %>% unlist()
}
# Tidy data ---------------------------------------------------------------
df_reviews = years_reviews %>% as_tibble() %>%
mutate(orcid_id = id) %>%
# left_join(df_orcid_names, by = "orcid_id") %>%
rename(year = value) %>%
mutate(journal_name = journal_names) #%>% select(-other_names)
tictoc::toc()
df_reviews
}
get_orcid_reviews(id = "0000-0001-7678-8656", method = "old")
get_orcid_reviews(id = "0000-0001-7678-8656", method = "new")
Thanks!
sorry for the long delay in responding @gorkang - its not clear from your last reply if you are happy with changes, or there's still some improvements we can make?
No problem @sckott . Last time I checked, there were two problems:
1) The function failed when the issn was not in issn_title.rda 2) I had to download issn_title.rda manually
Cheers.
I'm not having that problem. just removed rorcid then reinstalled from github, loaded rorcid and issn_title is there in the session. will keep thinking about what the problem could be
Regarding the first issue:
If an ISSN exists, it works great. If it does not exist, gives an error:
issn_title[["1939-2222"]]
[1] "Journal of Experimental Psychology General"
issn_title[["0000-2222"]]
Error in issn_title[["0000-2222"]] : subscript out of bounds
With a function such as the following, we can avoid the error:
get_title_from_issn <- function(issn) {
tryCatch(issn_title[[issn]], error = function(e) {rcrossref::cr_journals(issn)$data$title})
}
Regarding the second issue. After uninstalling using the gui it wasn't working, but using the remove.packages() function worked:
remove.packages("rorcid")
devtools::install_github("ropensci/rorcid")
library('rorcid')
Also, a final comment, for a single researcher with 20 review records (6 not in the issn_title file) it takes about 10s to fetch the journal titles. It is much better than the ~30s it used to take, but hopefully, there is still some room for improvement.
Thanks!
thanks - i'll take another look at the issn issue.
hopefully, there is still some room for improvement.
we'll continue to look for performance improvements 👍
note: still no ISSNs in the Crossref API /journals route, so can't work on update flow for the issn titles dataset
closing for now - added the script for updating the issn_title dataset in inst/ignore/issn_title_collect.R
I am trying to get the peer review activity from ORCID profiles using orcid_peer_reviews(). Everything seems to work fine, but I cannot find the journal names of the reviews.
For example, to get the following review from an ORCID profile...
I use the code below, but the closest I can get to the journal name is through the publons website URL. I can't see it in the general orcid_peer_reviews(id) or the orcid_peer_reviews(id, pur_code) calls.
Below the session details.