openjournals / joss

The Journal of Open Source Software
https://joss.theoj.org
MIT License
1.51k stars 184 forks source link

Ask researchers to cite software they use before allowing publication in JOSS #1277

Open RichardLitt opened 1 year ago

RichardLitt commented 1 year ago

Currently, there isn't a process where reviewers either proactively search for, or ask researchers to search for, all software used in the stack used in a papers' software, and all citations that could be made using that software. (That's a horrible run-on sentence: basically - if I am submitting a paper in JOSS, I should cite any and all software I use which can be cited, including dependencies). Right now, this is the criterion used in the checklist, from https://joss.readthedocs.io/en/latest/review_checklist.html:

 References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper [citation syntax](https://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html#citation_syntax)?

I wonder if this should be extended. For instance, almost every programming language could be cited using a Zenodo package, at least.

My impetus for asking is knowing that citing software is important for RSEs and for those who need citations, and it should be on the publisher to encourage best citation processes regarding citing software. I think that JOSS could lean a bit harder on how to cite software than it currently does, and I wonder if we could encourage profligate citation methods.

Happy to talk further on this - opening this as a starter to a conversation.

rkurchin commented 1 year ago

I like this idea, but given how deep dependency stacks can go, I think it would be good if there were a tool to make it easier, since it could be quite a bit of work for an author to do this manually. To first order, one could probably parse through dependencies from repos, find the repos for those dependencies, and check for a CITATION.cff file or something like that...

annulen commented 11 months ago

I like this idea, but given how deep dependency stacks can go, I think it would be good if there were a tool to make it easier,

I think it should not be mandatory to cite the whole stack, only "first order" dependencies. For comparison, when you cite results of research work, you don't have to cite all papers on which that research is built upon.

RichardLitt commented 11 months ago

It isn't mandatory to cite any software at the moment. While citing the entire stack is a bit of an absurd reduction, it would allow software which is essential to research to be cited even if it is a lower dependency. Often those deps reflect lots of work, too, and recognizing them could be a boon to the software engineers or researchers who created them.

Citing papers that research relies upon is a different (but similar) problem - those papers aren't explicitly used, while code which is a lower dependency is run each time by the software.

@rkurchin, that's what I was thinking, too. I think it's a good start.

jedbrown commented 11 months ago

Citing only the direct dependencies or only the transitive dependencies that make themselves very visible creates a bad incentive structure. It is necessary to have tooling that reports important transitive dependencies even when they are "quiet" (which often means reliably doing their job), and such tooling cannot be based solely on a static graph as you might have with CITATION.cff. Long ago, the PETSc project added a feature that reports transitive dependencies that are actually used in a given run -- we wrote a brief paper at the time. https://doi.org/10.6084/m9.figshare.785731

RichardLitt commented 11 months ago

Thanks Jed. The incentive structure is already bad; every little bit towards citing more research software counts, from my perspective.

I hacked this together today. It looks similar to what you built, but probably with less features in the long run. I've opened issues with work that needs to be done on it. Any contributions would be nice.

https://github.com/RichardLitt/dependency-cite

My goal at this point is this: Get a script that cites all of the software used in a package. Test it on itself, and on 2-5 packages published in JOSS, and on at least one non-JOSS package. Write a short paper on the result. Publish it on... JOSS?

I want feature parity with a couple of language ecosystems, better parsing of Citation.cff files, and DOI/Orcid catching before we get to the paper writing.

RichardLitt commented 11 months ago

@jedbrown On a much smaller note - I'm curious, do you have the TeX template that you used to write that paper? Was it just the base \article class?

jedbrown commented 11 months ago

@RichardLitt Yeah, plain article... Maybe email if you have further questions so we don't clog up this thread.