Remove authorship information from the code itself?

ayshih commented 4 years ago

Let's have a discussion about whether to remove authorship information from the code itself (e.g., __author__ and __email__). (See triggering comment thread.)

Some reasons to remove it:

It's not currently commonly available across most of our code.
The stuff that's there isn't necessarily accurate because it hasn't been maintained well.
Git history can be a more accurate way to track authorship, particularly for code that involves a variety of contributors.
I don't know whether users are even typically aware of these metadata variables.

Some reasons to keep it:

When a code file is essentially the work of a single person, it's nice to acknowledge that person.
sunpy as a downloaded package doesn't have its Git history, so no authorship information would be available locally via Git.
Git history can be misleading about the authorship that actually matters if there are swathes of inconsequential edits (e.g., for style).
I think the average (non-developer) user may be even less knowledgeable about using Git to get authorship information than looking for metadata variables.

If we do choose to keep this authorship information, it needs to be maintained. Perhaps this decision could be made on a per-subpackage basis, so that subpackage maintainers can choose to accept that responsibility?

Discuss.

nabobalis commented 4 years ago

If it's being stored anywhere it belongs in a nice rst doc file.

Cadair commented 4 years ago

I am in favour of removing it. Even in most cases when a file started off as the work on one or two people it rarely stays like that for very long and then the comments never get updated.

wtbarnes commented 4 years ago

I'm also in favor of removing it. At best it is redundant as authorship is already recorded in the git history and at worst it is inaccurate. It also has to be maintained by hand.

Whichever way we decide, I don't think we should do this on a subpackage level. It should be consistent across the entire code base.

Before we make a final decision though, I think we should get the opinions of those whose names we would be removing. Searching both __author__ and __authors__, it looks like that's

[ ] @ayshih
[x] @khughitt
[ ] @wafels
[x] @dpshelio
[ ] @ehsteve
[ ] @Hypnus1803
[ ] @mjm159,

ayshih commented 4 years ago

If it's being stored anywhere it belongs in a nice rst doc file.

That doesn't seem practical in terms of format or maintenance. Authorship is naturally coupled with individual code files, so creating and maintaining a separate list of authorships is even more work.

Even in most cases when a file started off as the work on one or two people it rarely stays like that for very long and then the comments never get updated.

Certainly there are many code files where the authorship list would be impractical to be listed as individuals, and should just be "SunPy collaboration" or something like that. However, there are also parts of the code – \<cough> coordinates \<cough> – where it really is just a extremely limited set of contributors, and may always be.

At best it is redundant as authorship is already recorded in the git history and at worst it is inaccurate.

The "redundancy" argument probably bothers me the most. Git is a development tool, not a documentation-for-users tool. Also, Git history keeps track of editors, not of authors, and it arguably can be muddled at that too. If I autopep8 a file, I shouldn't gain authorship. If I re-order function definitions in a file, I shouldn't gain authorship.

It also has to be maintained by hand.

Okay, maybe this bothers me more. We already ask "a lot" for code contributions – PEP 8, changelog entries, etc. – so it doesn't seem like a huge burden to also include updates to authorship as needed. (Admittedly, email addresses can become obsolete, but Git history doesn't fix that.)

nabobalis commented 4 years ago

That doesn't seem practical in terms of format or maintenance. Authorship is naturally coupled with individual code files, so creating and maintaining a separate list of authorships is even more work.

Nor is it practical to add a new author each time a commit is made to a file or a collection of files.

If you want to acknowledge specific people due to their contribution on a file or a package, an acknowledge section in a docstring at the top of the file or package __init__ I would be ok with. This more visible than __authors__ and will be in the docs as well so more people can see it.

The "redundancy" argument probably bothers me the most. Git is a development tool, not a documentation-for-users tool. Also, Git history keeps track of editors, not of authors, and it arguably can be muddled at that too.

How is __author__ documentation for users? It doesn't give them any useful information.

If I autopep8 a file, I shouldn't gain authorship. If I re-order function definitions in a file, I shouldn't gain authorship.

Why? If someone makes a change to a file why exclude them? Why are their contributions so useless to not deserve authorship?

Okay, maybe this bothers me more. We already ask "a lot" for code contributions – PEP 8, changelog entries, etc. – so it doesn't seem like a huge burden to also include updates to authorship as needed. (Admittedly, email addresses can become obsolete, but Git history doesn't fix that.)

We (try) to ask contributors to do meaningful changes. I am not sure that this would fall into the same category.

ayshih commented 4 years ago

I don't necessarily have objections to authorship information being removed, but I strongly dislike the offered justifications: that Git history is a competent substitute (it isn't), and that we're too lazy as maintainers (I'm not, at least).

I'd like to know whether there is consensus that authorship information is something we want to record on this project. If so, we can debate how (and I don't think Git should be the way). It doesn't have to be __author__; docstrings are probably a better approach. But, any record of authorship would need to be maintained.

It could instead be the stance that authorship information should be intentionally excluded, in a more "we are one" approach. For example, the project could feel that the inevitable arguments about authorship updates – whether code changes are substantial enough to warrant gaining authorship or whether someone's contributions have been so completely replaced that he/she should lose authorship – are deleterious to the project.

Okay, I've ranted enough about this. Time to add this as a topic for the coordination meeting!

nabobalis commented 4 years ago

I don't necessarily have objections to authorship information being removed, but I strongly dislike the offered justifications: that Git history is a competent substitute (it isn't), and that we're too lazy as maintainers (I'm not, at least).

I don't disagree with you here. But the concept of authorship on a piece of code that sees maybe 20 people working on it, I don't think is clear cut enough to warrant inclusion within our code.

Personally I think if we want to say that someone has contributed to sunpy or a piece of code, we should acknowledge them but I don't think authorship is how we should go about that.

It could instead be the stance that authorship information should be intentionally excluded, in a more "we are one" approach. For example, the project could feel that the inevitable arguments about authorship updates – whether code changes are substantial enough to warrant gaining authorship or whether someone's contributions have been so completely replaced that he/she should lose authorship – are deleterious to the project.

This I think should be goal. There should only be the project and "we" are cogs of that project.

khughitt commented 4 years ago

Fyi, I'm completely fine with either decision and leave it up to the current devs to decide :+1:

dpshelio commented 4 years ago

Late to the discussion, but here it's what I think.

The danger of keeping author information for the users (i.e., not via git) is that the users would be tempted to contact the author individually rather than through issues or mailing list. That would not be good! The developer contacted then should make the effort to actually report such issue upstream.

The other point, acknowledgement, is a tricky one. People need to be acknowledged for what they do! But, either we ease (and educate) how to do so or hardly people will check for that metadata, neither if included as doctstrings. I've been planning to test ImperialCollegeLondon/R2T2 which would extract all the citations of the pieces of software you use (if annotated), but that would acknowledge (normally) the algorithm and not the implementation. I imagine we could add something similar but for acknowledging the implementation if we want to do so.

If this information is to be kept to show how much I did, then I believe the git history is the way to go. Any of the developers can generate such a list and show it on their website if they wish to do so. But I would be cautious as to limit to that metric only. There's a lot of work that's done that's not reflected as commits (code review, community interaction, mailing list discussions, ...).

dstansby commented 2 years ago

Coming back round to this, I'm +1 for removing authorship for reasonns already given above.

ayshih commented 2 years ago

I still feel this way:

I don't necessarily have objections to authorship information being removed, but I strongly dislike the offered justifications: that Git history is a competent substitute (it isn't), and that we're too lazy as maintainers (I'm not, at least).

I'd like to know whether there is consensus that authorship information is something we want to record on this project. If so, we can debate how (and I don't think Git should be the way). It doesn't have to be __author__; docstrings are probably a better approach. But, any record of authorship would need to be maintained.

dstansby commented 2 years ago

Do you think authorship information is something we want to record then?

ayshih commented 2 years ago

Do you think authorship information is something we want to record then?

Yes, I think it has value. Of course, if the authorship is tracked per file, there are indisputably files that are too insane – I'm looking at you, mapbase.py – to be credited more finely than "SunPy developers".

To quote myself again:

It could instead be the stance that authorship information should be intentionally excluded, in a more "we are one" approach.

If that is agreed to be the project stance, it should be explicitly documented.

dstansby commented 2 years ago

How should we come to a decision on this then? It looks like @nabobalis, @dstansby, @wtbarnes, @Cadair and possibly @dpshelio are in favour of removing it and @ayshih in favour of keeping it. @ayshih would you be happy in that being enough of a majority to decide and document that

authorship information should be intentionally excluded, in a more "we are one" approach.

?

ayshih commented 2 years ago

Maybe not "happy", but I'll certainly accept a decision made by the group as long as it's not poorly justified.

hayesla commented 1 year ago

this has come up again on the community call - I think the consensus of @wafels @nabobalis @wtbarnes and myself are that they should probably go

hayesla commented 1 year ago

maybe a page on the docs of a "thanks to" or emeritus contributors section

ayshih commented 1 year ago

My stance hasn't changed, so I'll simply reiterate that I want the justification for the decision to be rooted in aspiration (running towards a "we are one" philosophy) rather than in fear (running away from the burden of authorship deliberation and maintenance).

nabobalis commented 1 year ago

The authorship of this package is given as the sunpy community, the same goes for any publication.

This should be extended to the files that have the __author__.

wtbarnes commented 1 year ago

How about we add something to the dev guide along the lines of

Given the wide array of contributions from many authors over a number of years, the "author" of the sunpy package should be regarded as the "The SunPy Community" rather than any one individual. As such, the __author__ and __email__ module level dunder names should not be included in any source file within the sunpy package.

wtbarnes commented 1 year ago

If we wanted, we could even draft an SEP with similar language.

dstansby commented 1 year ago

this has come up again on the community call - I think the consensus of @wafels @nabobalis @wtbarnes and myself are that they should probably go

I'll add my name to this list, running bravely towards "we are one"

sunpy / sunpy

Remove authorship information from the code itself? #3650