jaraco / inflect

Correctly generate plurals, ordinals, indefinite articles; convert numbers to words
https://pypi.org/project/inflect
MIT License
957 stars 107 forks source link

Copyright notice removed #197

Open jaraco opened 1 year ago

jaraco commented 1 year ago

In https://github.com/jaraco/inflect/commit/e640d3a08b1e55095be378b4f4b6e4b6806eb7fb#r120831163, @pwdyson expressed concern about the removal of the copyright notice. In that commit, I'd given some allusion to the motivation behind it, but I'm opening this issue to capture in more detail the motivation and to address Paul's request (alongside #194).

jaraco commented 1 year ago

As I maintain hundreds of projects in the Python ecosystem, maintaining an accurate and meaningful system for tracking attribution and copyright is crucial, and a simple notice "Copyright (year) (person)" is rarely adequate.

In https://github.com/jaraco/skeleton/issues/78, I was exploring the motivations behind copyright notices and what's necessary and what's valuable. In that research, it became apparent to me that a single line for copyright in a long-lived package is unlikely to be close to accurate.

Consider, for example, the copyright removed from this package. It indicated that the code released in 2023 was copyright solely by Paul Dyson in 2010. That's obviously not right. It's based on code written by (and thus copy right to) Paul Dyson back in 2010 (and probably before and after).

The problem with the "copyright {year} {entity}" is that it's antiquated. It's based on a distribution model in which software was released infrequently and maintained largely in relative isolation with little or no access to the source development history.

In today's world, open source code is rapidly developed in the open with sophisticated tooling for tracking provenance and attribution, rendering the one-line copyright notice more incorrect than correct.

In addition to the relatively permanent and fine-grained source history, there's also a publicly-managed PyPI repository of releases that also retain the attribution that was present in releases.

Consider the git fame for this repository:

 inflect main @ git fame
Processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:00<00:00, 44.20file/s]
Total commits: 925
Total ctimes: 3363
Total files: 130
Total loc: 8472
| Author               |   loc |   coms |   fils |  distribution   |
|:---------------------|------:|-------:|-------:|:----------------|
| Jason R. Coombs      |  5149 |    615 |     37 | 60.8/66.5/28.5  |
| Paul Dyson           |  1840 |    144 |     16 | 21.7/15.6/12.3  |
| Alex Grönholm        |   520 |     13 |     16 | 6.1/ 1.4/12.3   |
| David Gilman         |   410 |     13 |      6 | 4.8/ 1.4/ 4.6   |
| Hugo                 |   249 |     20 |      6 | 2.9/ 2.2/ 4.6   |
| Vamsi Atluri         |    72 |     13 |      4 | 0.8/ 1.4/ 3.1   |
| Thea                 |    42 |      6 |      2 | 0.5/ 0.6/ 1.5   |
| James Addison        |    29 |     19 |      3 | 0.3/ 2.1/ 2.3   |
| Thorben Krüger       |    23 |      5 |      2 | 0.3/ 0.5/ 1.5   |
| Sviatoslav Sydorenko |    21 |      5 |      3 | 0.2/ 0.5/ 2.3   |
| Niels Mündler        |    19 |     12 |      2 | 0.2/ 1.3/ 1.5   |
| KOLANICH             |    15 |      1 |      1 | 0.2/ 0.1/ 0.8   |
| Mitch Price          |    14 |      2 |      2 | 0.2/ 0.2/ 1.5   |
| Daniel Foerster      |    12 |      6 |      3 | 0.1/ 0.6/ 2.3   |
| Dimitri Papadopoulos |     9 |      1 |      4 | 0.1/ 0.1/ 3.1   |
| David Lord           |     7 |      3 |      2 | 0.1/ 0.3/ 1.5   |
| Joyce                |     5 |      1 |      1 | 0.1/ 0.1/ 0.8   |
| hugovk               |     5 |      1 |      1 | 0.1/ 0.1/ 0.8   |
| Katelyn Gigante      |     5 |      2 |      1 | 0.1/ 0.2/ 0.8   |
| paul                 |     4 |      1 |      1 | 0.0/ 0.1/ 0.8   |
| Anderson Bravalheri  |     3 |      2 |      2 | 0.0/ 0.2/ 1.5   |
| David J. Malan       |     3 |      1 |      1 | 0.0/ 0.1/ 0.8   |
| benthor              |     2 |      1 |      1 | 0.0/ 0.1/ 0.8   |
| Stanislav Zmiev      |     2 |      1 |      1 | 0.0/ 0.1/ 0.8   |
| Josiah VanderMey     |     2 |      1 |      2 | 0.0/ 0.1/ 1.5   |
| Alan Fregtman        |     1 |      1 |      1 | 0.0/ 0.1/ 0.8   |
| Khuyen Tran          |     1 |      1 |      1 | 0.0/ 0.1/ 0.8   |
| Sebastian Kriems     |     1 |      1 |      1 | 0.0/ 0.1/ 0.8   |
| Hugo van Kemenade    |     1 |      5 |      1 | 0.0/ 0.5/ 0.8   |
| Skyler Berg          |     1 |      1 |      1 | 0.0/ 0.1/ 0.8   |
| R                    |     1 |      1 |      1 | 0.0/ 0.1/ 0.8   |
| kimgerdes            |     1 |      1 |      1 | 0.0/ 0.1/ 0.8   |
| AndrewGnagy          |     1 |      1 |      1 | 0.0/ 0.1/ 0.8   |
| Tyr Mactire          |     1 |      2 |      1 | 0.0/ 0.2/ 0.8   |
| Brian Rutledge       |     1 |      1 |      1 | 0.0/ 0.1/ 0.8   |
| Al Johri             |     0 |      1 |      0 | 0.0/ 0.1/ 0.0   |
| Filipe Rodrigues     |     0 |      1 |      0 | 0.0/ 0.1/ 0.0   |
| Floyd Hightower      |     0 |      1 |      0 | 0.0/ 0.1/ 0.0   |
| Jennifer Richards    |     0 |      2 |      0 | 0.0/ 0.2/ 0.0   |
| Jie Bao              |     0 |      1 |      0 | 0.0/ 0.1/ 0.0   |
| Louis Sautier        |     0 |      1 |      0 | 0.0/ 0.1/ 0.0   |
| Maciej Urbanski      |     0 |      2 |      0 | 0.0/ 0.2/ 0.0   |
| MapleCCC             |     0 |      1 |      0 | 0.0/ 0.1/ 0.0   |
| Nicolas Ward         |     0 |      1 |      0 | 0.0/ 0.1/ 0.0   |
| Shaun Patterson      |     0 |      1 |      0 | 0.0/ 0.1/ 0.0   |
| Tim Gates            |     0 |      1 |      0 | 0.0/ 0.1/ 0.0   |
| U-GRMSASIA\212329932 |     0 |      1 |      0 | 0.0/ 0.1/ 0.0   |
| Vincent Fazio        |     0 |      1 |      0 | 0.0/ 0.1/ 0.0   |
| Zach Burnett         |     0 |      1 |      0 | 0.0/ 0.1/ 0.0   |
| johnthagen           |     0 |      1 |      0 | 0.0/ 0.1/ 0.0   |
| layday               |     0 |      1 |      0 | 0.0/ 0.1/ 0.0   |
| wim glenn            |     0 |      1 |      0 | 0.0/ 0.1/ 0.0   |
| yavar                |     0 |      2 |      0 | 0.0/ 0.2/ 0.0   |

By my count, that's 56 authors having contributed in some way to this project. And at the same time, that list and the former copyright, fail to capture the copyright held by projects from which this project was derived (both in spirit and in code).

The references in https://github.com/jaraco/skeleton/issues/78 reinforced my understanding that the inclusion of an explicit, one-line "copyright notice" is antiquated and obviated by law, practice, and the modern reality.

My preference is to work in the open, be transparent and fair and equitable, and avoid unnecessary, toilsome, inaccurate notices for the sake of outdated convention.

That said, Paul Dyson's opinion gets an outsized influence on the decision here, given his work as a seminal author.

@pwdyson Can you tell me more about your motivation for wanting to keep the copyright notice with your name? The project still lists you as primary author, although my intention is to eventually remove hard-coded author/maintainer and derive it instead from SCM history. Are you concerned with attribution, legal implications, or something else? Help me understand what you'd like to achieve and I'd like to work with you to achieve your objectives.

jaraco commented 1 year ago

From pwdyson in #194:

First, let me say thanks for continuing to maintain this project. Without your work to keep this package up to date, what I state below would not be possible.

I feel pretty strongly about keeping my copyright notice on this package. However, I am not against other people being acknowledged, either by name or in some sort of generic statement. We can discuss the wording.

There are a number of reasons for wanting to keep the copyright statement, but here is just one.

Having my name there increases my professional reputation. I am a software engineer working in a large corporation. As you would expect, our internal code repository does not include the names of the employees who work on the code. However, if you search the repository for my name, it is there, because inflect is one of the many third party software libraries the corporation uses. So I can ask colleagues to search for my name in the code repository and they find it. This has a “Wow!” factor that impresses people and enhances my professional reputation. Removing my name reduces my professional reputation as I could no longer do this at work. I am hoping that my work does not ingest a current version of inflect without my name on it. To prevent this from happening I want the copyright notice restored as soon as possible.

This needs to be a copyright notice because not all of the files in the project are kept by my employer. Copyright, however, is something that large corporations respect and will not remove.

The year 2010 enhances my reputation as it shows I’ve been around for a while. This is seen favourably, particularly by many employees in their 20s that I work with. Also, it is a soft signal to legal that I am not working on this code any more and so the corporation does not have any ownership of the code.

jaraco commented 1 year ago

And jayaddison:

The way I'd think of it is: think of a book, document or article that has someone's name attached to it as one of the authors. If you were that person, would you be OK with your name being removed when the article is being reprinted in an updated format, because it's more convenient for the publisher?

I think that the vast majority of people would say 'no' - they'd prefer their name to be retained on the work. That's independent or not or what public opinion is about them and any claims that the publisher makes in that regard -- it can be equally important to remember what someone said, regardless of how history views them. In addition, some people might under some circumstances be OK with their name being removed - or even want it to be. Those situations could indicate some other kind of problem, but it's unclear what the nature of that might be.

Either way: if Paul has expressed an opinion for his name to remain in the copyright notices for the work, and if the two of you have not agreed and confirmed to remove it, then I think that it should stay in place.

jaraco commented 1 year ago

I've been pondering this issue and I have some thoughts.

Copyright, however, is something that large corporations respect and will not remove.

I reject this rationale as a good reason to utilize a copyright notice. It suggests that if I have an important message that I wish not to be removed downstream, I could add it to the copyright notice. For example,

Copyright 2023 Tabs are better than spaces

I think you'd agree that such a copyright would be abuse of the copyright to elevate its influence and durability.

Having my name there increases my professional reputation.

The year 2010 enhances my reputation as it shows I’ve been around for a while.

These are important details that I agree should be acknowledged somewhere that's likely (and verifiably) included in the source. I believe that would better be done in a specific acknowledgements section that avoids abuse of the copyright (even if unintentional), especially when the copyright conveys increasingly incorrect information.

The way I'd think of it is: think of a book, document or article that has someone's name attached to it as one of the authors. If you were that person, would you be OK with your name being removed when the article is being reprinted in an updated format, because it's more convenient for the publisher?

I believe this analogy is a poor one for the reasons mentioned above. Software is not and has not been for a very long time analogous to document publishing. Instead, the content is continuously evolving with the copyright held by a complex history of contributors and dates. We're not talking about reprinting in an updated format, but continued evolution, maintenance, improvements, and refactorings.

After all, what should happen should I choose to refactor the code and move some of the code to another module? Should the copyright be replicated? Should it remain in the original file (inflect.py) as the sole remaining content?

There's an additional problem of it being a derived work. If I understand copyright correctly, the original copyright is not held solely by the original author of the Python library if it was derived from another work, but is also held by the authors of the Perl module. In fact, I seem to recall having removed Perl code from this codebase that was presumably copied directly from that project.

All of these concerns leave me unconvinced that retaining the copyright is the best option for the project.

That said, out of respect for Paul, my plan is to restore the copyright and then propose alternative forms that meet the same goals but avoid the pitfalls laid out above.