mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.67k stars 1.31k forks source link

add citation.cff to repo #9624

Closed drammock closed 2 years ago

drammock commented 3 years ago

GitHub now supports a "cite this repository" functionality. We should consider adding a citation.cff file to enable it for this repo. More info here:

https://twitter.com/natfriedman/status/1420122675813441540

cc @adam2392 (in case you want this for mne-connectivity) cc @sappelhoff (for mne-bids) cc @hoechenberger (for mne-bids-pipeline)

hoechenberger commented 3 years ago

Yeah I'm not convinced, but willing to change my mind: https://github.com/mne-tools/mne-bids/issues/460#issuecomment-888192809

sappelhoff commented 3 years ago

I share Richard's concern. I'd want this to link to https://joss.theoj.org/papers/10.21105/joss.01896 for mne-bids and I am not exactly sure that's how this new feature is intended from the GH side.

drammock commented 3 years ago

I think the intent is clearly to make it easy to cite the software itself instead of a canonical paper, but it is possible to include info about canonical papers into the CITATION.cff file too (in an optional references section). You can also customize the message that appears in the little GitHub flyout box, so it would be possible to draw attention to any included canonical refs.

I've had a few conversations with @larsoner and @agramfort about the issue of "cite the paper vs cite the software" over the last year. If I may summarize the two perspectives that are most in tension:

Here are my personal comments:

  1. The impression I'm getting from my (admittedly somewhat limited) interactions with the open science / open data communities is that most of those folks prefer / encourage citations of the software itself rather than a canonical paper (or at least, citing the canonical journal paper should be in addition to citing the software itself).
  2. MNE-Python already has 2 "canonical" citations, which already makes it very hard to quantify impact (e.g., we don't know how many papers cite us because we don't have a good way of detecting which papers cite both of our canonical refs and counting them once vs twice). Having to deal with citations of the software itself doesn't seem like it would make the problem that much worse than it already is.
  3. fairness / giving contributors credit is important, regardless of whether they contributed before or after the canonical journal pub. I would even make the stronger claim that there is something akin to exploitation happening when some people (authors on the canonical paper) benefit from the work of others who are not given credit in that way. At the time there was no realistic alternative, but now that there is a relatively easy way to give credit more equitably, I think we should do so.

To drive home the point: the Frontiers paper has 11 authors, but the all-time # 4 contributor to MNE-Python (by number of commits) is @jaeilepp, who is not among those 11 authors. Neither is @jona-sassenhagen (# 7), @kingjr (# 8), myself (# 11), nor @GuillaumeFavelier (# 12).

The last time we talked about this, I think I convinced @agramfort to at least allow us to recommend citing both the software and the canonical journal papers. The upshot of that was adding codemeta.json to the repo, but I never got around to the subsequent step of making it easy for people to convert that into a copy-pastable citation, and then altering our docs to update our recommendation. This new integration now makes that next step easier.

Thoughts?

cbrnr commented 3 years ago

Thanks for this very nice summary @drammock! I completely agree with your point raised about fairness – I think citing the software directly is definitely an improvement over the current situation (even if done in addition to the publication). One additional issue you did not mention is that only journal publications count in academia, which is probably why almost anyone would prefer to be on a publication than just part of a long contributors list in a software citation. I could not disagree more with this academic policy, but that's how it is at the moment. So anything to improve this situation is very welcome – and this includes making it easier for people to cite the software. Therefore, I'm +1 for adding this new functionality. As a side note, it's questionable to use the number of commits as the only performance measure; if software citations get more widely accepted in the future, we will have to think about how to better measure contributions (but I know in this case you used this to make a point).

sappelhoff commented 3 years ago

Yes, thanks for this excellent summary @drammock. I think I was a bit short-sighted and selfish in my comment, wanting to profit from a couple of more citations on the JOSS paper and avoid people to only cite the software. But then again even in a young package like MNE-BIDS we have at least one author (@adam2392) who has invested a high amount of effort but is not on the paper because as his luck would have it, he started to contribute only slightly after that publication ... and that's unfair and should be remedied.

I think citing the software and the paper is a good recommendation in general. A couple of packages have been recommending this for a long time, for example pybids:

To credit PyBIDS in your work, please cite both the JOSS paper and the Zenodo archive. The former provides a high level description of the package, and the latter points to a permanent record of all PyBIDS versions

For MNE-Python with its two papers, I don't know what the recommendation should be (software and the two other papers?, or software and one of the two papers?).

There are also some news with respect to Zenodo (and Zotero), which will make use of this new GH feature: https://twitter.com/ZENODO_ORG/status/1420357001490706442

As a side note, it's questionable to use the number of commits as the only performance measure;

yes, I agree. Taking part in discussions, code review, user support, or the "impact per commit" are harder to measure.

Just fyi, there is also https://github.com/casperdcl/git-fame with which you can measure impact by "surviving lines of code" (or deletions, insertions, ...)

hoechenberger commented 3 years ago

Thanks @drammock!

I think I should clarify that I'm not against adding this at all; I just question its value: I have yet to come across a single journal that would accept software "citations". I've tried it with every single article I've published, and every single time the typesetters demanded a reference to a published, peer-reviewed paper instead (i.e., not even a Zenodo DOI was sufficient). So my software references would typically end up as URLs in footnotes.

mmagnuski commented 3 years ago

Thanks for the summary @drammock, I agree with your points and all the other comments here and think it is a good idea to add citation.cff and recommend citing the software and the paper.

@hoechenberger I agree its not always easy to cite software in academic papers, but things are changing. eLife is a good example of a journal accepting software citations - we had no trouble citing github code repositories in our last paper there (these citations show in the bibliography with a small grey text saying "software"). As @cbrnr mentions - we have to keep pushing and demand software citations with every paper we publish :) After reading this thread I am commited to cite every significant piece of software (and not just the papers) in my future publications. :rocket:

cbrnr commented 3 years ago

After reading this thread I am commited to cite every significant piece of software (and not just the papers) in my future publications. 🚀

I agree. Although this leads to further questions such as which software tools are significant enough to be cited? If I did an EEG/MNE analysis with MNE-Python, this certainly means MNE-Python. But what about things used by MNE-Python such as NumPy, SciPy, Matplotlib? Or going even further because we also use Python, CPython, and maybe VS Code and possibly Linux? IMO I'd only cite packages that are directly relevant for a given research project.

hoechenberger commented 3 years ago

IMHO software doesn't fit into the currently-used citation scheme of printed journals. For online publications, I think it would make sense if one could specify the software used to produce the final results, and there would be a way to automatically extract the dependency graph from each package and attach it to the publication. I would commonly also cite Inkscape if I use it to finalize some figures, but I've got the impression nobody else does this. Maybe a "Used Software" table should become mandatory when submitting a paper? With name and version of the software. Would also help with reproducibility, I suppose

mmagnuski commented 3 years ago

@cbrnr Yeah, there is some complexity to the decision process of choosing which software should be cited, so there are always "grey areas". I think I would cite all the non-standard packages that I imported and used myself that I consider relevant for the study. In many cases that would include NumPy, SciPy, Pandas and Matplotlib (and many others). @hoechenberger The used software table is a good idea - actually eLife introduced something like this some time ago in the form of Key Resources Table, but regrettably I didn't use it.

kingjr commented 3 years ago

@drammock I agree with the above points and related discussion.

A small comment: dilution (i.e. publishing multiple papers on a software) is generally advantageous because softwares are cited very heavily, so you'd need to dilute by a huge number before it really become detrimental to one's H-index.

Relatedly I personally got less active after realizing PRs-related work is not valued in grant or job applications as long as their not linked to an authored paper, and I suspect this has been the case for several core contributors. But I do think any form of visibility counts. There are many ways to pay tribute to the contributors e.g.

agramfort commented 3 years ago

to get more visibility to contributors I've been pushing over the years to add the name of each contributor in what's new page, to have the name of authors at the top of the scripts, examples and tutorials. Also during each release email we credit each contributor. That's the best I can do to offer more visibility but I encourage any other initiative. The blog post by clemens years ago were retweeted by the mne twitter account and myself.

I cannot fix the academic credit issue dominated by "real" journal papers but I am sure zenodo is a start.

my 2c

drammock commented 3 years ago

to get more visibility to contributors I've been pushing over the years to add the name of each contributor in what's new page, to have the name of authors at the top of the scripts, examples and tutorials. Also during each release email we credit each contributor.

Those are all good steps, and appreciated!

That's the best I can do to offer more visibility but I encourage any other initiative.

Should we interpret "I encourage any other initiative" as a green light to (1) add a CITATIONS.cff file, and (2) change our docs to encourage users to cite both the software itself and the relevant paper(s)?

agramfort commented 3 years ago

Should we interpret "I encourage any other initiative" as a green light to (1) add a CITATIONS.cff file, and (2) change our docs to encourage users to cite both the software itself and the relevant paper(s)?

Yes 👍

adam2392 commented 2 years ago

Was this citation ever added?

drammock commented 2 years ago

Was this citation ever added?

nope. wanna make a PR?

adam2392 commented 2 years ago

Was this citation ever added?

nope. wanna make a PR?

Is there a good way to get a list of everyone involved to make a file such as this?

https://github.com/bids-standard/pybv/pull/76

drammock commented 2 years ago

Is there a good way to get a list of everyone involved to make a file such as this?

have a look at tools/generate_codemeta.py it will get you most of the way there I think.