mdmintz / pynose

pynose fixes nose to extend unittest and make testing easier
https://pypi.org/project/pynose/
GNU Lesser General Public License v2.1
11 stars 6 forks source link

Lack of attribution and licence information for code derived from CPython #33

Closed emilazy closed 2 months ago

emilazy commented 2 months ago

The following code was introduced to nose/importer.py in b5247565df1652e4e4a74ff69b3cfe6fa7db3f05:

https://github.com/mdmintz/pynose/blob/cc8654687a7cdbbfbe5d441650b21715c2b1127e/nose/importer.py#L21-L125

This code is clearly a derivative work of the since‐removed CPython Lib/imp.py file, with most functions and documentation being clearly based on the CPython code, some with no changes at all.

The original code is copyrighted by the Python Software Foundation, and released under the terms of the Python Software Foundation License Version 2. Derivative works are permitted, and there is no obstacle to including such a derivative work in a larger work licensed under the LGPL, but there are conditions; here is a relevant excerpt:

2. Subject to the terms and conditions of this License Agreement, PSF hereby
grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce,
analyze, test, perform and/or display publicly, prepare derivative works,
distribute, and otherwise use Python alone or in any derivative version,
provided, however, that PSF's License Agreement and PSF's notice of copyright,
i.e., "Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023 Python Software Foundation;
All Rights Reserved" are retained in Python alone or in any derivative version
prepared by Licensee.

3. In the event Licensee prepares a derivative work that is based on
or incorporates Python or any part thereof, and wants to make
the derivative work available to others as provided herein, then
Licensee hereby agrees to include in any such work a brief summary of
the changes made to Python.

By my reading, the following requirements to distribute a derivative work of this CPython code were not met:

  1. inclusion of the LICENSE text, either directly in the relevant file or elsewhere in the source repository;

  2. inclusion of the PSF’s notice of copyright, i.e. Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023 Python Software Foundation; All Rights Reserved, with the code that is a derivative work of code the PSF owns;

  3. inclusion of a brief summary of the changes made to the original code, as it is based on/incorporating a part of CPython.

These need to be corrected for legal compliance with the licence granted by the copyright holders of the code this derivative work was based on. However, there is unfortunately an additional potential complication:

6. This License Agreement will automatically terminate upon a material
breach of its terms and conditions.

Unlike most modern licences, this does not include any grace period or clause to restore the licence if the breach is corrected. If your licence to prepare and distribute derivative works has been terminated due to non‐compliance with the terms, then the code in nose/importer.py would constitute an illegal infringement of copyright.

In this case, however, I am sure that if the breach was accidental and you come into compliance with the licence requirements, the PSF would have no reason to stop you using the code; I would recommend addressing these compliance issues promptly and then contacting the Python Software Foundation at psf@python.org to inform them of this issue and that you have addressed it upon being notified, and ask them to either confirm that they do not consider this to have constituted a material breach of the licence terms or, if they do consider you to have materially breached them, to license you to use the CPython code again under the same terms.

mdmintz commented 2 months ago

Are you saying that I just need to add something like this snippet below? (@emilazy , @jchv)

Adapted from the CPython 3.11 imp.py code.
Copyright (c) 2001-2023 Python Software Foundation; All Rights Reserved
Originally licensed under the PSLv2 and incorporated under the LGPL 2.1.

Based on the popular https://github.com/pdbpp/pdbpp package, an example of a repo that took CPython code and modified it (compare https://github.com/pdbpp/pdbpp/blob/master/src/pdbpp.py to https://github.com/python/cpython/blob/f481a02e6c7c981d1316267bad5fb94fee912ad6/Lib/pdb.py) Eg, more specifically: (compare https://github.com/pdbpp/pdbpp/blob/master/src/pdbpp.py#L1085 to https://github.com/python/cpython/blob/f481a02e6c7c981d1316267bad5fb94fee912ad6/Lib/pdb.py#L446) it shows clearly that they modified CPython code... and they have a BSD 3-Clause License, which sounds different from CPython's LGPL 2.1 License.

So from my pdbpp example above, here are the big questions I have:

pdbpp is quite a bit more popular, and used by a lot of major companies. (https://github.com/pdbpp/pdbpp/network/dependents) I just want to be sure that pynose isn't being selected out unfairly, as the data I've been gathering this morning seems to make it appear that modifying CPython code is very widespread, and in many ways handled similarly to how https://github.com/pdbpp/pdbpp handled it.

Also separate from this issue is my pdbp fork to fix pdbpp. Was broken on Windows (https://github.com/pdbpp/pdbpp/issues/498). Also broken for pytest (https://github.com/pdbpp/pdbpp/issues/519). I told them about my fork, and people are quite happy that I stepped in to fix things (as I do with a lot of things in the Python ecosystem):

pdbp comes to save the day

Back to the original topic, it appears that my pynose code has already been widely used in places such as Alpine Linux:

That also means that my code can be found in Azure, AWS, Google Cloud, and Docker:

I'm happy to see that I made a difference in the Python ecosystem, and that lots of people are gaining value from my fixes.


I'll be waiting for a response to the two questions I posted earlier in this message in regards to pdbpp and pynose.

jchv commented 2 months ago

Are you saying that I just need to add something like this snippet below? (@emilazy , @jchv)

Adapted from the CPython 3.11 imp.py code.
Copyright (c) 2001-2023 Python Software Foundation; All Rights Reserved
Originally licensed under the PSLv2 and incorporated under the LGPL 2.1.

Basically, yes.

Based on the popular https://github.com/pdbpp/pdbpp package, an example of a repo that took CPython code and modified it (compare https://github.com/pdbpp/pdbpp/blob/master/src/pdbpp.py to https://github.com/python/cpython/blob/f481a02e6c7c981d1316267bad5fb94fee912ad6/Lib/pdb.py) Eg, more specifically: (compare https://github.com/pdbpp/pdbpp/blob/master/src/pdbpp.py#L1085 to https://github.com/python/cpython/blob/f481a02e6c7c981d1316267bad5fb94fee912ad6/Lib/pdb.py#L446) it shows clearly that they modified CPython code... and they have a BSD 3-Clause License, which sounds different from CPython's LGPL 2.1 License.

Nit: CPython is mostly PSLv2 licensed, and the pdb.py library doesn't appear to have a separate license unless I missed it.

So from my pdbpp example above, here are the big questions I have:

  • How is it that pdbpp is OK, whereas pynose isn't OK with respect to how things were handled?

We don't package pdbpp in NixOS, so it's a bit out of scope for us. However it appears pdbpp is out of compliance with copyright licenses it is beholden to. This should be reported upstream. If the maintainers are working in good faith then I'd hope they would be willing to fix this. It shouldn't impact much since the PSL2 is a permissive license anyways.

  • If what pdbpp is OK, then what does pynose need to do to in order to modify CPython code without the License issue?

I'm not sure what this means, but what pdbpp is doing does not appear to be OK.

pdbpp is quite a bit more popular, and used by a lot of major companies. (https://github.com/pdbpp/pdbpp/network/dependents) I just want to be sure that pynose isn't being selected out unfairly, as the data I've been gathering this morning seems to make it appear that modifying CPython code is very widespread, and in many ways handled similarly to how https://github.com/pdbpp/pdbpp handled it.

Also separate from this issue is my pdbp fork to fix pdbpp. Was broken on Windows (pdbpp/pdbpp#498). Also broken for pytest (pdbpp/pdbpp#519). I told them about my fork, and people are quite happy that I stepped in to fix things (as I do with a lot of things in the Python ecosystem):

For what it's worth, just because something is popular does not mean it does not have licensing issues, or that the licensing issues can be ignored. There was a huge explosion with Ruby on Rails not that long ago due to licensing issues.

Unfortunately it's not possible to always catch these issues. A lot of smaller community projects are not fully-compliant with copyright licenses in small ways, e.g. Apache 2 technically requires that every file individually has a copyright disclaimer IIRC, but many projects don't do this. There's definitely differing levels of severity though, and "fails to disclose copyright holders and licensing obligations" is higher than most of the clerical errors.

Back to the original topic, it appears that my pynose code has already been widely used in places such as Alpine Linux:

I hope you understand though, that this in and of itself does not actually provide any meaningful assurance that your project is actually complying with its legal obligations. Even with larger organizations who have much greater auditing standards, they mostly rely on automated scanning to detect licensing issues, but they can only do this if the projects are actually annotated properly; the tooling can't detect if code is copied from an undisclosed author with an undisclosed license.

If pynose is not updated to comply with its legal obligations, all of these downstream users will need to be informed and will probably have to find another contingency plan.

That also means that my code can be found in Azure, AWS, Google Cloud, and Docker:

I'm happy to see that I made a difference in the Python ecosystem, and that lots of people are gaining value from my fixes.

Congratulations, but I'm not really sure what this has to do with the issue other than it means there's a whole lot of people who are going to have additional copyright license auditing work.


Anyway, I hope you realize that we're not just here to be part of some weird schadenfreude hate brigade, but rather just flagging the issue because we noticed it. Here's the timeline of what happened:

That's the full story.

To be honest, I would've preferred to just use pynose because it saves me/us the effort of trying to find each way that nose breaks on Python 3.12, so I find this whole thing very unfortunate.

The legal obligations that you have are pretty clearly outlined in the LICENSE files of the projects that you copied from, and while there are some gray area bits (e.g. the use of Git to carry authorship information is somewhat commonpractice even though it possibly makes GitHub tarball distributions a violation of some license terms) this is not one: if you copy code from somewhere it needs proper attribution and licensing. Obviously nobody can force you to adhere to it, but no amount of counterexamples of other projects violating license terms or companies that inherit that license term violation will change the underlying facts.


Also, in case it is not evident, I am not a lawyer and do not mean to construe any of this text as legal advice. It is just my understanding of the situation as a layperson.

emilazy commented 2 months ago

It depends on whether you intend to include the full licence text. If this notice on its own was written independently before this issue was raised and the licence text was not included, it would at least be much less likely to amount to a material breach, as it makes an effort to credit the copyright holder of the code and refers to the licence. However, now that you are aware of the exact requirements of the licence you would certainly be expected to come into full compliance with it.

So I would say that it is likely acceptable as long as you retain the full licence text in your source repository and any packaged distributions. The PSLv2 wording “provided, however, that PSF's License Agreement and […] are retained […] in any derivative version” is less vague about mechanism than the LGPL’s “give any other recipients of the Program a copy of this License along with the Program”; since you took code from CPython to make a derivative work from, you would be expected to ensure that the full licence text from CPython is also kept. Copying CPython’s LICENSE file into your repository (you could rename it to LICENSE.cpython to make it clear that it’s not the licence of the bulk of the code) and referencing it in the notice attached to the derived code would satisfy this obligation.

I would recommend keeping the copyright notice as the verbatim Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023 Python Software Foundation; All Rights Reserved specified by the licence text in order to match its requirements. However, the collapse of the year ranges is a trivial difference, so it may be considered acceptable by the Python Software Foundation’s lawyers. I would recommend getting in touch with them as it is possible that you will need them to grant you a replacement licence anyway due to the clause I mentioned in my original report.

About pdb++: to clarify, CPython is under the PSLv2 licence, not LGPL 2.1. As PSLv2 is a permissive licence it is acceptable to include derivative works of portions of it in LGPL 2.1 works, but the licences are quite different. In particular, it is also okay to include derivative works of CPython code in BSD‐licensed projects.

The specific function you linked may be too simple to fall under copyright protection, but if there is more copying along those lines, given that I can’t find any visible attribution of the PSF’s copyright or a copy of the PSLv2 licence, my answer is that it wouldn’t be okay for pdb++ either.

The only reason I’m reporting these issues with pynose and not pdb++ is because pynose came up in my work on NixOS. I haven’t looked at pdb++ because we don’t ship that code in any form, so its licence compliance is of no concern to us; there’s no singling out here beyond what concerns are applicable to us as a downstream distribution and raised to our attention. I would recommend you contact the pdb++ upstream and/or the Python Software Foundation if you have worries about their use of CPython code.

I expect that, like NixOS, Alpine was not aware of the use of CPython code when incorporating this patch. I’ve let Alpine know about this report so they can follow along and keep updated on the progress:

mdmintz commented 2 months ago

A pull request is now available: https://github.com/mdmintz/pynose/pull/34

mdmintz commented 2 months ago

The PR has been merged! Thank you @emilazy and @jchv for assisting!

jchv commented 2 months ago

Great, I'm glad this could be resolved in a quick and amicable manner. I think downstreams can now have pretty good confidence there are no serious license/copyright issues here.

Thanks!

Kangie commented 2 months ago

Good work everyone. This ticket can probably be closed off. I've raised the upstream compliance issue with PSF and the upstream. I don't have much hope of an upstream resolution given the 3+ years since commits but we tried.

@mdmintz to allay any fears of non compliance you may want to independently reach out to the PSF legal guys and get their opinion. I don't think that anybody would argue that you haven't made a good-faith attempt to adhere to the T&Cs