subugoe / hoad

Deprecated: Please check https://github.com/subugoe/hoaddash
https://github.com/subugoe/hoaddash
GNU Affero General Public License v3.0
15 stars 4 forks source link

add plots and analysis from `_cr_springer.Rmd` again #211

Closed maxheld83 closed 4 years ago

maxheld83 commented 4 years ago

same as #210 also temporarily removed to debug #82.

Must check whether there's something in here we're missing elsewhere.

maxheld83 commented 4 years ago

title: "Comparing the indexing coverage of SpringerLink with that of Crossref" author: "Najko Jahn" output: html_document: keep_md: true df_print: paged

knitr::opts_chunk$set(echo = TRUE, message=FALSE)

In its blog post, the INTACT project compares the indexing coverage of Crossref, a DOI registration agency for scholarly works, with that of SpringerLink, a digital library dedicated to content published by Springer. Examining five journals including European Radiology, they found that the article coverage differs between these two sources and concluded:

The results are clear: When it comes to journal metrics (both OA and total), Crossref data is too sketchy to rely on.

This is very harsh given the importance of Crossref to study the prevalence of open access and for open access monitoring. So, let's examine whether we come to the same conclusion.

Analyses

To do so, I firstly downloaded the yearly article volume from the journal European Radiology from SpringerLink, starting in 2015.

Let's load these metadata into R and obtain information about when and in which volumes articles were published:

library(tidyverse)
my_files <- list.files(pattern = ".csv")
springer <- purrr::map_df(my_files, readr::read_csv)
springer %>%
  count(`Publication Year`, `Journal Volume`)

Four records seem to represent journal information. There are also online-first articles published in 2017 and 2018, which have not appeared in a printed volume, yet.

Now, let's obtain metadata via the Crossref API using the rcrossref package, and check whether Crossref's and SpringerLink's indexing coverage of articles published in European Radiology 2015 and 2016 is identical. For this aim, we firstly used the from-pub-date parameter as the INTACT study did, and secondly, the from-print-pub-date parameter was used to avoid confusion between online-first and print publication.

library(rcrossref)
# R call representing from-pub-date query
cr_from_online <- rcrossref::cr_works(filter = c(issn = "0938-7994", 
                                        from_pub_date = "2015-01-01", 
                                        until_pub_date = "2016-12-31",
                                        type = "journal-article"),
                             limit = 1000, cursor = "*", cursor_max = 5)

# R call representing from-print-pub-date query
cr_from_print <- rcrossref::cr_works(filter = c(issn = "0938-7994", 
                                        from_print_pub_date = "2015-01-01", 
                                        until_print_pub_date = "2016-12-31",
                                        type = "journal-article"),
                             limit = 1000, cursor = "*", cursor_max = 5)

Are there different result sets?

Dataset obtained from querying by first date of publication:

cr_from_online$data %>% 
  count(volume)

Dataset obtained from querying by date of publication in a printed volume:

cr_from_print$data %>% 
    count(volume)

While articles queried by from-published-date were published in three different yearly volumes, filtering with from_print_pub_date results in an identical number of articles obtained via SpringerLink.

Finally, let's check whether the SpringerLink 2015-2016 and Crossref from_print_pub_date sets are equal using DOIs:

# filter 2015 and 2016 publications
springer_15_16 <- springer %>%
  filter(`Publication Year` %in% c(2015, 2016))
setequal(springer_15_16$`Item DOI`, cr_from_print$data$DOI)

Conclusion

In conclusion, by checking Crossref and SpringerLink for articles published in "European Radiology" no article coverage differences could be found between these two sources. However, when comparing the indexing coverage of Crossref and SpringerLink, query parameters must be harmonized in order to guarantee equal article sets.

Session info

sessionInfo()
njahn82 commented 4 years ago

it does not need to migrated to the hoad implementation

On Tue, 2 Jun 2020 at 17:39, Max Held notifications@github.com wrote:


title: "Comparing the indexing coverage of SpringerLink with that of Crossref" author: "Najko Jahn" output: html_document: keep_md: true df_print: paged

knitr::opts_chunk$set(echo = TRUE, message=FALSE)

In its blog post https://www.intact-project.org/general/openapc/2018/03/22/offsetting-coverage/, the INTACT project compares the indexing coverage of Crossref, a DOI registration agency for scholarly works, with that of SpringerLink, a digital library dedicated to content published by Springer. Examining five journals including European Radiology, they found that the article coverage differs between these two sources and concluded:

The results are clear: When it comes to journal metrics (both OA and total), Crossref data is too sketchy to rely on.

This is very harsh given the importance of Crossref to study the prevalence of open access https://peerj.com/articles/4375/ and for open access monitoring http://www.knowledge-exchange.info/event/oa-monitoring. So, let's examine whether we come to the same conclusion. Analyses

To do so, I firstly downloaded the yearly article volume from the journal European Radiology http://www.springer.com/medicine/radiology/journal/330 from SpringerLink, starting in 2015.

Let's load these metadata into R and obtain information about when and in which volumes articles were published:

library(tidyverse) my_files <- list.files(pattern = ".csv") springer <- purrr::map_df(my_files, readr::read_csv) springer %>% count(Publication Year, Journal Volume)

Four records seem to represent journal information. There are also online-first articles published in 2017 and 2018, which have not appeared in a printed volume, yet.

Now, let's obtain metadata via the Crossref API https://api.crossref.org/ using the rcrossref package https://github.com/ropensci/rcrossref, and check whether Crossref's and SpringerLink's indexing coverage of articles published in European Radiology 2015 and 2016 is identical. For this aim, we firstly used the from-pub-date parameter as the INTACT study did, and secondly, the from-print-pub-date parameter was used to avoid confusion between online-first and print publication.

library(rcrossref)

R call representing from-pub-date query

cr_from_online <- rcrossref::cr_works(filter = c(issn = "0938-7994", from_pub_date = "2015-01-01", until_pub_date = "2016-12-31", type = "journal-article"), limit = 1000, cursor = "*", cursor_max = 5)

R call representing from-print-pub-date query

cr_from_print <- rcrossref::cr_works(filter = c(issn = "0938-7994", from_print_pub_date = "2015-01-01", until_print_pub_date = "2016-12-31", type = "journal-article"), limit = 1000, cursor = "*", cursor_max = 5)

Are there different result sets?

Dataset obtained from querying by first date of publication:

cr_from_online$data %>% count(volume)

Dataset obtained from querying by date of publication in a printed volume:

cr_from_print$data %>% count(volume)

While articles queried by from-published-date were published in three different yearly volumes, filtering with from_print_pub_date results in an identical number of articles obtained via SpringerLink.

Finally, let's check whether the SpringerLink 2015-2016 and Crossref from_print_pub_date sets are equal using DOIs:

filter 2015 and 2016 publications

springer_15_16 <- springer %>% filter(Publication Year %in% c(2015, 2016)) setequal(springer_15_16$Item DOI, cr_from_print$data$DOI)

Conclusion

In conclusion, by checking Crossref and SpringerLink for articles published in "European Radiology" no article coverage differences could be found between these two sources. However, when comparing the indexing coverage of Crossref and SpringerLink, query parameters must be harmonized in order to guarantee equal article sets. Session info

sessionInfo()

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/subugoe/hoad/issues/211#issuecomment-637633612, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM7YRWL4X6EEYRVTFQAQGLRUUMK5ANCNFSM4NQ245BQ .

maxheld83 commented 4 years ago

thanks!