Closed emilazy closed 4 months ago
Are you saying that I just need to add something like this snippet below? (@emilazy , @jchv)
Adapted from the CPython 3.11 imp.py code.
Copyright (c) 2001-2023 Python Software Foundation; All Rights Reserved
Originally licensed under the PSLv2 and incorporated under the LGPL 2.1.
Based on the popular https://github.com/pdbpp/pdbpp package, an example of a repo that took CPython code and modified it (compare https://github.com/pdbpp/pdbpp/blob/master/src/pdbpp.py to https://github.com/python/cpython/blob/f481a02e6c7c981d1316267bad5fb94fee912ad6/Lib/pdb.py)
Eg, more specifically: (compare https://github.com/pdbpp/pdbpp/blob/master/src/pdbpp.py#L1085 to https://github.com/python/cpython/blob/f481a02e6c7c981d1316267bad5fb94fee912ad6/Lib/pdb.py#L446) it shows clearly that they modified CPython code... and they have a BSD 3-Clause License
, which sounds different from CPython's LGPL 2.1 License
.
So from my pdbpp
example above, here are the big questions I have:
pdbpp
is OK, whereas pynose
isn't OK with respect to how things were handled?pdbpp
is OK, then what does pynose
need to do to in order to modify CPython code without the License issue?pdbpp
is quite a bit more popular, and used by a lot of major companies. (https://github.com/pdbpp/pdbpp/network/dependents)
I just want to be sure that pynose
isn't being selected out unfairly, as the data I've been gathering this morning seems to make it appear that modifying CPython code is very widespread, and in many ways handled similarly to how https://github.com/pdbpp/pdbpp handled it.
Also separate from this issue is my pdbp
fork to fix pdbpp
. Was broken on Windows (https://github.com/pdbpp/pdbpp/issues/498). Also broken for pytest (https://github.com/pdbpp/pdbpp/issues/519). I told them about my fork, and people are quite happy that I stepped in to fix things (as I do with a lot of things in the Python ecosystem):
Back to the original topic, it appears that my pynose
code has already been widely used in places such as Alpine Linux:
I'm happy to see that I made a difference in the Python ecosystem, and that lots of people are gaining value from my fixes.
I'll be waiting for a response to the two questions I posted earlier in this message in regards to pdbpp
and pynose
.
Are you saying that I just need to add something like this snippet below? (@emilazy , @jchv)
Adapted from the CPython 3.11 imp.py code. Copyright (c) 2001-2023 Python Software Foundation; All Rights Reserved Originally licensed under the PSLv2 and incorporated under the LGPL 2.1.
Basically, yes.
Based on the popular https://github.com/pdbpp/pdbpp package, an example of a repo that took CPython code and modified it (compare https://github.com/pdbpp/pdbpp/blob/master/src/pdbpp.py to https://github.com/python/cpython/blob/f481a02e6c7c981d1316267bad5fb94fee912ad6/Lib/pdb.py) Eg, more specifically: (compare https://github.com/pdbpp/pdbpp/blob/master/src/pdbpp.py#L1085 to https://github.com/python/cpython/blob/f481a02e6c7c981d1316267bad5fb94fee912ad6/Lib/pdb.py#L446) it shows clearly that they modified CPython code... and they have a
BSD 3-Clause License
, which sounds different from CPython'sLGPL 2.1 License
.
Nit: CPython is mostly PSLv2 licensed, and the pdb.py library doesn't appear to have a separate license unless I missed it.
So from my
pdbpp
example above, here are the big questions I have:
- How is it that
pdbpp
is OK, whereaspynose
isn't OK with respect to how things were handled?
We don't package pdbpp
in NixOS, so it's a bit out of scope for us. However it appears pdbpp
is out of compliance with copyright licenses it is beholden to. This should be reported upstream. If the maintainers are working in good faith then I'd hope they would be willing to fix this. It shouldn't impact much since the PSL2 is a permissive license anyways.
- If what
pdbpp
is OK, then what doespynose
need to do to in order to modify CPython code without the License issue?
I'm not sure what this means, but what pdbpp
is doing does not appear to be OK.
pdbpp
is quite a bit more popular, and used by a lot of major companies. (https://github.com/pdbpp/pdbpp/network/dependents) I just want to be sure thatpynose
isn't being selected out unfairly, as the data I've been gathering this morning seems to make it appear that modifying CPython code is very widespread, and in many ways handled similarly to how https://github.com/pdbpp/pdbpp handled it.Also separate from this issue is my
pdbp
fork to fixpdbpp
. Was broken on Windows (pdbpp/pdbpp#498). Also broken for pytest (pdbpp/pdbpp#519). I told them about my fork, and people are quite happy that I stepped in to fix things (as I do with a lot of things in the Python ecosystem):
For what it's worth, just because something is popular does not mean it does not have licensing issues, or that the licensing issues can be ignored. There was a huge explosion with Ruby on Rails not that long ago due to licensing issues.
Unfortunately it's not possible to always catch these issues. A lot of smaller community projects are not fully-compliant with copyright licenses in small ways, e.g. Apache 2 technically requires that every file individually has a copyright disclaimer IIRC, but many projects don't do this. There's definitely differing levels of severity though, and "fails to disclose copyright holders and licensing obligations" is higher than most of the clerical errors.
Back to the original topic, it appears that my
pynose
code has already been widely used in places such as Alpine Linux:
I hope you understand though, that this in and of itself does not actually provide any meaningful assurance that your project is actually complying with its legal obligations. Even with larger organizations who have much greater auditing standards, they mostly rely on automated scanning to detect licensing issues, but they can only do this if the projects are actually annotated properly; the tooling can't detect if code is copied from an undisclosed author with an undisclosed license.
If pynose is not updated to comply with its legal obligations, all of these downstream users will need to be informed and will probably have to find another contingency plan.
That also means that my code can be found in Azure, AWS, Google Cloud, and Docker:
https://azuremarketplace.microsoft.com/en-us/marketplace/apps/solvedevops1643693563360.alpine-linux
https://aws.amazon.com/marketplace/pp/prodview-ghulxwkkimv6e
I'm happy to see that I made a difference in the Python ecosystem, and that lots of people are gaining value from my fixes.
Congratulations, but I'm not really sure what this has to do with the issue other than it means there's a whole lot of people who are going to have additional copyright license auditing work.
Anyway, I hope you realize that we're not just here to be part of some weird schadenfreude hate brigade, but rather just flagging the issue because we noticed it. Here's the timeline of what happened:
I began working on my own patches to make nose
work on Python 3.12. I copied code from imp.py
from Python 3.11 and adapted it into nose/importer.py
. I added a note that I did this in the code (though later on it was improved, since initially it still was not fully compliant with the PSL2 terms.)
At some point I figured out via searching GitHub that Alpine had already done this, and decided it'd be easier to just pull from Alpine Linux.
It was pointed out to me that the licensing situation with Alpine Aports was unclear. We were not sure exactly what to do in this moment.
We decided that most of the changes were trivial enough that they may not, on their own, be eligible for copyright protection, but the blob in nose/importer.py
was. Of course, I realized immediately that it was just the PSL2-licensed code from CPython's standard library, and suggested that it would be okay.
It was finally noticed that it was actually your patch, and this issue was created shortly after.
That's the full story.
To be honest, I would've preferred to just use pynose
because it saves me/us the effort of trying to find each way that nose
breaks on Python 3.12, so I find this whole thing very unfortunate.
The legal obligations that you have are pretty clearly outlined in the LICENSE files of the projects that you copied from, and while there are some gray area bits (e.g. the use of Git to carry authorship information is somewhat commonpractice even though it possibly makes GitHub tarball distributions a violation of some license terms) this is not one: if you copy code from somewhere it needs proper attribution and licensing. Obviously nobody can force you to adhere to it, but no amount of counterexamples of other projects violating license terms or companies that inherit that license term violation will change the underlying facts.
Also, in case it is not evident, I am not a lawyer and do not mean to construe any of this text as legal advice. It is just my understanding of the situation as a layperson.
It depends on whether you intend to include the full licence text. If this notice on its own was written independently before this issue was raised and the licence text was not included, it would at least be much less likely to amount to a material breach, as it makes an effort to credit the copyright holder of the code and refers to the licence. However, now that you are aware of the exact requirements of the licence you would certainly be expected to come into full compliance with it.
So I would say that it is likely acceptable as long as you retain the full licence text in your source repository and any packaged distributions. The PSLv2 wording “provided, however, that PSF's License Agreement and […] are retained […] in any derivative version” is less vague about mechanism than the LGPL’s “give any other recipients of the Program a copy of this License along with the Program”; since you took code from CPython to make a derivative work from, you would be expected to ensure that the full licence text from CPython is also kept. Copying CPython’s LICENSE
file into your repository (you could rename it to LICENSE.cpython
to make it clear that it’s not the licence of the bulk of the code) and referencing it in the notice attached to the derived code would satisfy this obligation.
I would recommend keeping the copyright notice as the verbatim Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023 Python Software Foundation; All Rights Reserved
specified by the licence text in order to match its requirements. However, the collapse of the year ranges is a trivial difference, so it may be considered acceptable by the Python Software Foundation’s lawyers. I would recommend getting in touch with them as it is possible that you will need them to grant you a replacement licence anyway due to the clause I mentioned in my original report.
About pdb++: to clarify, CPython is under the PSLv2 licence, not LGPL 2.1. As PSLv2 is a permissive licence it is acceptable to include derivative works of portions of it in LGPL 2.1 works, but the licences are quite different. In particular, it is also okay to include derivative works of CPython code in BSD‐licensed projects.
The specific function you linked may be too simple to fall under copyright protection, but if there is more copying along those lines, given that I can’t find any visible attribution of the PSF’s copyright or a copy of the PSLv2 licence, my answer is that it wouldn’t be okay for pdb++ either.
The only reason I’m reporting these issues with pynose and not pdb++ is because pynose came up in my work on NixOS. I haven’t looked at pdb++ because we don’t ship that code in any form, so its licence compliance is of no concern to us; there’s no singling out here beyond what concerns are applicable to us as a downstream distribution and raised to our attention. I would recommend you contact the pdb++ upstream and/or the Python Software Foundation if you have worries about their use of CPython code.
I expect that, like NixOS, Alpine was not aware of the use of CPython code when incorporating this patch. I’ve let Alpine know about this report so they can follow along and keep updated on the progress:
A pull request is now available: https://github.com/mdmintz/pynose/pull/34
The PR has been merged! Thank you @emilazy and @jchv for assisting!
Great, I'm glad this could be resolved in a quick and amicable manner. I think downstreams can now have pretty good confidence there are no serious license/copyright issues here.
Thanks!
Good work everyone. This ticket can probably be closed off. I've raised the upstream compliance issue with PSF and the upstream. I don't have much hope of an upstream resolution given the 3+ years since commits but we tried.
@mdmintz to allay any fears of non compliance you may want to independently reach out to the PSF legal guys and get their opinion. I don't think that anybody would argue that you haven't made a good-faith attempt to adhere to the T&Cs
The following code was introduced to
nose/importer.py
in b5247565df1652e4e4a74ff69b3cfe6fa7db3f05:https://github.com/mdmintz/pynose/blob/cc8654687a7cdbbfbe5d441650b21715c2b1127e/nose/importer.py#L21-L125
This code is clearly a derivative work of the since‐removed CPython
Lib/imp.py
file, with most functions and documentation being clearly based on the CPython code, some with no changes at all.The original code is copyrighted by the Python Software Foundation, and released under the terms of the Python Software Foundation License Version 2. Derivative works are permitted, and there is no obstacle to including such a derivative work in a larger work licensed under the LGPL, but there are conditions; here is a relevant excerpt:
By my reading, the following requirements to distribute a derivative work of this CPython code were not met:
inclusion of the
LICENSE
text, either directly in the relevant file or elsewhere in the source repository;inclusion of the PSF’s notice of copyright, i.e.
Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023 Python Software Foundation; All Rights Reserved
, with the code that is a derivative work of code the PSF owns;inclusion of a brief summary of the changes made to the original code, as it is based on/incorporating a part of CPython.
These need to be corrected for legal compliance with the licence granted by the copyright holders of the code this derivative work was based on. However, there is unfortunately an additional potential complication:
Unlike most modern licences, this does not include any grace period or clause to restore the licence if the breach is corrected. If your licence to prepare and distribute derivative works has been terminated due to non‐compliance with the terms, then the code in
nose/importer.py
would constitute an illegal infringement of copyright.In this case, however, I am sure that if the breach was accidental and you come into compliance with the licence requirements, the PSF would have no reason to stop you using the code; I would recommend addressing these compliance issues promptly and then contacting the Python Software Foundation at psf@python.org to inform them of this issue and that you have addressed it upon being notified, and ask them to either confirm that they do not consider this to have constituted a material breach of the licence terms or, if they do consider you to have materially breached them, to license you to use the CPython code again under the same terms.