rOpenGov / eurostat

R tools for Eurostat data
http://ropengov.github.io/eurostat
Other
234 stars 46 forks source link

Package citation #275

Closed pitkant closed 8 months ago

pitkant commented 1 year ago

Providing clear and concise way to cite the package is essential in encouraging users to cite the software they use. Traditionally, writing a software publication in a journal may have been seen as a more legitimate way to provide citable reference to academic users, but in recent years actors such as FORCE11 has encouraged to cite software directly. From Software citation principles:

  1. Importance: Software should be considered a legitimate and citable product of research. Software citations should be accorded the same importance in the scholarly record as citations of other research products, such as publications and data; they should be included in the metadata of the citing work, for example in the reference list of a journal article, and should not be omitted or separated. Software should be cited on the same basis as any other research product such as a paper or a book, that is, authors should cite the appropriate set of software products just as they cite the appropriate set of papers.

  2. Credit and attribution: Software citations should facilitate giving scholarly credit and normative, legal attribution to all contributors to the software, recognizing that a single style or mechanism of attribution may not be applicable to all software.

  3. Unique identification: A software citation should include a method for identification that is machine actionable, globally unique, interoperable, and recognized by at least a community of the corresponding domain experts, and preferably by general public researchers.

  4. Persistence: Unique identifiers and metadata describing the software and its disposition should persist—even beyond the lifespan of the software they describe.

  5. Accessibility: Software citations should facilitate access to the software itself and to its associated metadata, documentation, data, and other materials necessary for both humans and machines to make informed use of the referenced software.

  6. Specificity: Software citations should facilitate identification of, and access to, the specific version of software that was used. Software identification should be as specific as necessary, such as using version numbers, revision numbers, or variants such as platforms.

The current citation for Eurostat package is:

> citation("eurostat")
Kindly cite the eurostat R package as follows:

  (C) Leo Lahti, Janne Huovari, Markus Kainu, Przemyslaw Biecek. Retrieval and
  analysis of Eurostat open data with the eurostat package. R Journal
  9(1):385-392, 2017. doi: 10.32614/RJ-2017-019 Package URL:
  http://ropengov.github.io/eurostat Article URL:
  https://journal.r-project.org/archive/2017/RJ-2017-019/index.html

A BibTeX entry for LaTeX users is

  @Article{,
    title = {Retrieval and Analysis of Eurostat Open Data with the eurostat Package},
    author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek},
    journal = {The R Journal},
    volume = {9},
    number = {1},
    pages = {385--392},
    year = {2017},
    doi = {10.32614/RJ-2017-019},
    url = {https://doi.org/10.32614/RJ-2017-019},
  }

The eurostat package has a published article in R Journal from 2017. Citing the article fulfils the 1. Importance requirement, as scholarly articles tend to be cited. In some ways 3. Unique identification, 4. Persistence and 5. Accessibility are also achieved by linking to the R Journal Article that in turn links to CRAN repository with a persistent URL. CRAN repository stores and archive of the package versions even if it was removed from CRAN. However, citing the article does not attribute contributions made to the package after 2017 (2. Credit and attribution), and it does not mention which version of the software was used in the analysis (6. Specificity).

Indeed, Katz et al paper Software Citation Implementation Challenges states explicitly that "R has guidance already, and that guidance does not match the software citation principles. For instance the guidance provided by the R Project does not include a version number or license information". R guidance seems to be mostly geared towards traditional academic journal and book citations, since the only example it offers is for a bibtype "book".

In aforementioned FORCE11 Software citation principles it is stated:

Currently, and for the foreseeable future, software papers are being published and cited, in addition to software itself being published and cited, as many community norms and practices are oriented towards citation of papers. As discussed in the Importance principle (1) and the discussion above, the software itself should be cited on the same basis as any other research product; authors should cite the appropriate set of software products. If a software paper exists and it contains results (performance, validation, etc.) that are important to the work, then the software paper should also be cited. We believe that a request from the software authors to cite a paper should typically be respected, and the paper cited in addition to the software.

Therefore we should at least add a proper software citation to eurostat package in addition to retaining the R Journal citation.

pitkant commented 1 year ago

Fixed in 7597740:

In the previous iteration the citation seemed to be a mishmash of several different citation styles:

(C) Leo Lahti, Janne Huovari, Markus Kainu, Przemyslaw Biecek. Retrieval and
  analysis of Eurostat open data with the eurostat package. R Journal
  9(1):385-392, 2017. doi: 10.32614/RJ-2017-019 Package URL:
  http://ropengov.github.io/eurostat Article URL:
  https://journal.r-project.org/archive/2017/RJ-2017-019/index.html

R-exts states that "In case a bibentry contains LaTeX markup (e.g., for accented characters or mathematical symbols), it may be necessary to provide a text representation to be used for printing via the textVersion argument to bibentry" and bibentry help files that "Only if special LaTeX macros (e.g., math formatting) or special characters (e.g., with accents) are necessary, a textVersion should be provided.". Therefore it seems that textVersion is not needed but as users could just use print(citation, style = "text") but I think it's nice for end user convenience. However, we should be consistent.

The new textVersion citation is styled after Harvard:

Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and analysis of
  Eurostat open data with the eurostat package. R Journal 9(1), pp. 385-392. doi:
  10.32614/RJ-2017-019

Why Harvard? Because it is supposedly used in almost all disciplines whereas other popular styles such as APA, MLA and Chicago are more used in specific disciplines

It is also incidentally quite close to what print(citation("eurostat"), style = "text") also outputs:

Lahti L, Huovari J, Kainu M, Biecek P (2017). “Retrieval and Analysis of Eurostat
Open Data with the eurostat Package.” _The R Journal_, *9*(1), 385-392.
doi:10.32614/RJ-2017-019 <https://doi.org/10.32614/RJ-2017-019>,
<https://doi.org/10.32614/RJ-2017-019>.

rOpenSci states that their recommended way of styling a citation is

## To cite package 'magick' in publications use:
## 
##   Jeroen Ooms (2021). magick: Advanced Graphics and Image-Processing in
##   R. R package version 2.7.3. https://CRAN.R-project.org/package=magick
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {magick: Advanced Graphics and Image-Processing in R},
##     author = {Jeroen Ooms},
##     year = {2021},
##     note = {R package version 2.7.3},
##     url = {https://CRAN.R-project.org/package=magick},
##   }

which is somewhat close to what the old solution was. I'm not entirely sure what citation format "Firstname Lastname (year). Title. Note. URL" conforms to so maybe it's best to use Harvard.

antagomir commented 1 year ago

Seems good!

pitkant commented 11 months ago

Now merged into v4-dev, feedback very much welcome

pitkant commented 8 months ago

Closed with the CRAN release of package version 4.0.0