purcell / package-lint

A linting library for elisp package metadata
GNU General Public License v3.0
192 stars 33 forks source link

Check for conforming licences #83

Open purcell opened 7 years ago

purcell commented 7 years ago

All elisp packages for versions of Emacs that support package.el must be licenced under GPLv3+ or compatible licences.

We should report errors if this cannot easily be determined to be the case by looking at package headers. Some guidelines about the text people should include is in this comment.

Tagging @tarsius here.

purcell commented 7 years ago

Probably @tarsius has some code we could use here, since he recently wrote code to audit published packages' licences. :-)

alphapapa commented 7 years ago

Really? Why is this? Not that I object, just curious.

tarsius commented 7 years ago

Really? Why is this? Not that I object, just curious.

For the Emacsmirror I wrote some code to detect the license a long time ago and a few weeks ago I published some statistics on emacs-devel. I thought people would be happy that most packages are released under some license authored by the FSF. However it turned out that Richard was instead deeply troubled that there are packages that are not released under a GPLv3 compatible license, including GPLv2 licensed packages. (I do not share his opinion that because Emacs is releases under the GPLv3, all "This file is not part of Emacs" Elisp has to be released under a compatible license.)

https://www.youtube.com/watch?v=TmQKihNpsHk#t=3m19s

As a result I had to do damage control and quickly improve my extraction tools (there were ~600 packages whose license it could not detect initially) and contact maintainers to ask them to change the license so we could continue to distribute them while keeping Richard happy.

That was my main intention when working on the license detection code; match as many weird ways of specifying the license so that the situation would as quickly as possible not look worse than it actually is and to have mostly reliable lists of maintainers to contact.

And it was a mess I wrote a long time ago to begin with. So it is not quite ready to be used by package-lint yet. Eventually I would like to get it into acceptable shape and then it can be used here. You can find it here: https://github.com/emacscollective/elx/tree/license-wip. You can of course use it now already, but note that I will continue to make significant changes in a rather disorganized fashion, including to the function signature.

Fanael commented 7 years ago

But why? Packages can be use incompatible licenses just fine, all it would prevent is redistributing them with Emacs itself (and that's irrelevant because Emacs is not free software anyway, what with only the FSF being allowed to contribute); people can download and use them without issue.

To me, it sounds like needless pandering to the FSF.

tarsius commented 7 years ago

But why? Packages can be use incompatible licenses just fine, all it would prevent is redistributing them with Emacs itself (and that's irrelevant because Emacs is not free software anyway, what with only the FSF being allowed to contribute); people can download and use them without issue.

Because Richard disagrees with that. I think he is wrong.

Fanael commented 7 years ago

I'd suggest Richard put his code where his mouth is and rewrite the "offending" packages, then. Third-party package authors are under no obligation to pander to him.

alphapapa commented 7 years ago

It's even weirder when compared with gcc. This would seem like saying that any software compiled with GCC must be licensed with GPLv3. Or that any software written in Guile must be. Obviously that's not the case.

https://www.youtube.com/watch?v=TmQKihNpsHk#t=3m19s

I think I know which of the methods is relevant here. ;)

Fanael commented 7 years ago

It's even weirder when compared with gcc. This would seem like saying that any software compiled with GCC must be licensed with GPLv3.

GCC is a funny example, because Linux uses GCC plugins licensed under GPLv2 only as a part of the kernel build, while GCC is GPLv3+. Yes, they're incompatible, no, nobody cares.

purcell commented 7 years ago

This would seem like saying that any software compiled with GCC must be licensed with GPLv3.

No, that's not the same. It's more like saying any GCC extensions must be licenced with GPL.

I'm absolutely not a GPL zealot, and also not a lawyer, but there's at least a borderline reasonable argument that all elisp is an Emacs extension, which runs only inside Emacs, and is therefore subject to GPL. And if that is legally the case, then MELPA etc. may not redistribute de facto GPL'd code under incompatible terms, e.g. under an author's invalid or missing licence terms.

MELPA now therefore firmly requires packages to be licenced with GPLv3+ or a compatible licence, and my goal here is simply to reduce the MELPA maintenance burden.

Fanael commented 7 years ago

It's more like saying any GCC extensions must be licenced with GPL.

Exactly, and there's a real-world example of people writing, using and redistributing GPLv3-incompatible (namely, GPLv2-only) GCC plugins that I mentioned earlier.

there's at least a borderline reasonable argument that all elisp is an Emacs extension, which runs only inside Emacs

I actually tested some of my packages under SBCL, and they worked with minor modifications, so there's that 😉

Then again, 2-clause BSD is GPL-compatible, so I'm in the clear.

then MELPA etc. may not redistribute de facto GPL'd code under incompatible terms

Why not? GPL incompatibility would matter only if repositories redistributed GPL-ed code bundled together with GPL-incompatible code, which they do not, since they just redistribute each package separately.

de facto GPL'd code

That's not how licensing works. Even if a work is a derivative of some other GPL work, it's not automatically GPL'd, the author of the derived work would "just" be violating the license of the upstream work, but if somebody redistributed the derived work on the basis that it's de facto GPL, they'd be "guilty" of "copyright infringement". I remember this exact thing happening with some leaked exFAT (IIRC) drivers.


Disclaimer: I am not a lawyer (besides, even if I were one, that would still mean squat, because the great thing about law is that there are so many jurisdictions to choose from), and I believe that the very concept of "copyright" and "intellectual property" is cancerous and should be abolished.

purcell commented 7 years ago

Yep, I agree with a bunch of this, but there's not much point us all debating how we think things should work or should be interpreted. We (the MELPA maintainers) are pursuing the path which puts us least at odds with the core Emacs community and FSF because we have limited time to waste on this stuff.

Disclaimer: I am not a lawyer (besides, even if I were one, that would still mean squat, because the great thing about law is that there are so many jurisdictions to choose from), and I believe that the very concept of "copyright" and "intellectual property" is cancerous and should be abolished.

I believe this is exactly the point of GPL, fwiw.

joewreschnig commented 7 years ago

I actually tested some of my packages under SBCL, and they worked with minor modifications

RMS has said that if this is the case, the programs are probably not a derivative work of Emacs:

As for the more general question, we think that a program that uses Emacs facilities needs to be GPL-covered, but a program that just uses the Lisp language could have any license--it is not affected by the license of Emacs.

However, this is a (probably vanishingly) small set of Emacs packages.

purcell commented 4 years ago

See also #117. If we standardised on requiring/recommending License: <SPDX License ID> headers, we could actually check for conforming licences. (As a non-US english speaker, my poor brain melts a little every time I make it write "license" instead of "licence".)

alphapapa commented 4 years ago

(As a non-US english speaker, my poor brain melts a little every time I make it write "license" instead of "licence".)

@purcell Would you say that the spelling of license causes your brain to incense? :)

purcell commented 4 years ago

AAARRGH

lassik commented 4 years ago

Some tools using SPDX metadata expect the license to be stated like this:

;; SPDX-License-Identifier: GPL-3.0-or-later

Could we simply use that one? Are there particular tools around Emacs that expect the keyword to be ;; License: ...?

lassik commented 4 years ago

Those SPDX-License-Identifiercomments are proving so popular that they're even being added to the Linux kernel: https://github.com/torvalds/linux/blob/master/tools/include/tools/endian.h

tarsius commented 4 years ago

I am still using elx-license from my elx package, which I mentioned above. It can determine the license of all 5634 packages that are available from the Emacsmirror and all 1245 packages that are available from the Emacsattic. That includes all packages that are available from Melpa.

TL;DR If Melpa and/or package-lint wants to do that sort of thing too, then it should use elx-license.

See https://emacsmirror.net/stats/licenses.html for statistics about the used licenses.

The reason elx-license can do that is not that each and every package author specifies the license in a reasonable fashion. Instead this tool goes to great lengths detecting the license even if it is specified in a rather wacky way.

elx-license tries hard to always return the same string for a given license regardless of how it was specified. Of course people occasionally invent new wacky ways to specify the license but I usually catch that quickly.

At present elx-license does not use all the same identifiers as the SPDX License List. The reason for that could be that they changed it since I last worked on this. I do remember having used some "authoritative list of license identifiers" but don't remember if it was an older version of this one or a completely different one. In any case using the SPDX identifiers is probably a good idea and I will look into that.

These are the various methods used by elx-license in order:

  1. The preferred method to specify the license is using a known permission statement at the beginning of the "main library". (The main library is the library whose name matches the name of the package.)

    This is what the authors of emacs packages are told to do in the emacs documentation, so we should definitely accept this way of specifying the license and not insist on it being specified in some other way.

    elx-license support the permission statements of six families of licenses specified by the variables elx-{gnu,bsd,mit,isc,cc,wtf}-permission-statement-regexp. I do not intend to support any more less commonly used licenses this way.

  2. Then we try the value specified using the License or Licence header keyword as specified in the main library. We validate the value; if it cannot be identified as a GNU license, then we ignore it for now but might later come back to it.

    I will extend that to also support SPDX-License-Identifier. I cannot remember having this even just once in an elisp library, and I have looked at many, but if that is the currently fashionable way to specify the license then that should of course be supported.

  3. Then we try the licensee utility. This is the same tool as used by Github. It extracts the license from the LICENSE file (or similar).

    If this tool is missing, then we skip to the next step.

  4. Then why try some wacky non-standard permission statements that some people use for GNU licenses.

  5. Then why try the library header again. This time around we accept the license if it is a recognized non-gnu license (GNU licenses were accepted in (2).)

  6. Then we try again for some wacky non-standard permission statements for licenses that some people use. This time we look for non-GNU licenses, we did the GNU licenses in (4).

  7. Finally we use a list of 18 hard-coded package to license mappings. That's 18 out of 7324.

Yes this is rather wacky. That's because it deals with reality and I've seen things you people wouldn't believe...

I don't think we should purge packages if we can determine the license just because it specified in a way that is not sanctioned. Within reason that is.

I am completely on board telling the maintainers of the packages for which we need (7) to use (1-3), or else. We could also do so for at least some of the packages that need (4) or (6). In any case I do not want to be involved in the process of going around and urging maintainers to "specify their licenses better". (I have done similar things in the past. A lot. I am done with it.) But if we decide that the license has to be specified in a "somewhat reasonable way, which happens to be used by more than a single maintainer" or a package is removed after a grace period, then I can get behind that.

lassik commented 4 years ago

You are a true Emacs hero. That is experience speaking.

purcell commented 4 years ago

Yeah, that's awesome. Rather than check that a tool which goes to heroically great lengths to figure out the licence will succeed with the code being linted, I'd prefer to just give a clear indication to the author that they should use an easy-to-detect and acceptable method of specifying their licence.

I think that might mean that package-lint would accept the standard GPL boilerplate if present, and otherwise would warn that the SPDX- header should be added. And we'd have a list of GPL-compatible IDs for that header.

Fanael commented 4 years ago

I'd prefer to just give a clear indication to the author that they should use an easy-to-detect and acceptable method of specifying their licence.

I agree, as a linting tool we should be enforcing best practices, so looking for the GPL boilerplate and/or an SPDX header is what I'd prefer to do too.

lassik commented 4 years ago

Here's the SPDX license metadata: https://github.com/spdx/license-list-data/blob/master/json/licenses.json Unfortunately, I can't find a field that says whether each license is GPL-compatible.

lassik commented 4 years ago

Also worth keeping in mind:

purcell commented 4 years ago

This'll be fun.

lassik commented 4 years ago

I love how everyone from the usual MELPA toil gang is gathered in one thread. @riscy has been spared so far, but not any more :D Naturally enough it's the most tedious of all maintenance topics: not only license incompatibility, but license notice incompatibility :)

tarsius commented 4 years ago

I agree, as a linting tool we should be enforcing best practices, so looking for the GPL boilerplate and/or an SPDX header is what I'd prefer to do too.

I was conflating package-lint and Melpa above. Of course a linter should not go as far as elx-license. (I might add a elx-license-strict though.)

We should not remove any (or very few) packages from Melpa because of how they specify the license. But we can and agree that we should impose some restrictions for packages that are being added to Melpa.

I think that might mean that package-lint would accept the standard GPL boilerplate if present, and otherwise would warn that the SPDX- header should be added. And we'd have a list of GPL-compatible IDs for that header.

We should also continue to support the Licen[sc]e header because that's what we have been advertising as well as the various values we suggested in the past.

lassik commented 4 years ago

We should not remove any (or very few) packages from Melpa because of how they specify the license. But we can and agree that we should impose some restrictions for packages that are being added to Melpa.

Fully agreed.

We should also continue to support the Licen[sc]e header because that's what we have been advertising as well as the various values we suggested in the past.

Do you have a list of those values?

Here's a scraping of the SPDX metadata to get a list of "good" license identifiers: https://misc.lassi.io/2019/package-lint-licenses/licenses-good.el

...where "good" means all of these apply:

The problem is, SPDX doesn't have GPL compatibility metadata. "FSF libre" is much more broad than that. But it may be close enough for a linter.

purcell commented 4 years ago

Re. GPL compatibility, we'd have to go by the list of compatible licenses on the GNU page, really. I think a simple set of checks would look like this:

This way, people who had used auto-insert (or otherwise have the standard GPL boilerplate) would be encouraged to start using SPDX-.... To get clean package-lint status, they'd have to explicitly add that licence header, but if they have GPLv3 boilerplate there would be only a warning, and not a hard error.

Fanael commented 4 years ago
  • SPDX- or License header names a GPL version older than v3 => warning

I believe that should be an error, because GPL 1 and 2 aren't compatible with 3.

purcell commented 4 years ago

I believe that should be an error, because GPL 1 and 2 aren't compatible with 3.

Yes, potentially.

lassik commented 4 years ago

GPL 1 and 2 aren't compatible with 3.

GPL-2.0-or-later is compatible with GPL 3. GPL-2.0-only is not.

We could just throw a lint error on GPL 1. GPL 2 is from 1991 :)

lassik commented 4 years ago

If we take the short list from SPDX list and manually remove everything that GNU doesn't explicitly say is GPL-compatible, would that be a good list for the linter?

purcell commented 4 years ago

Yes, I think that would be ideal.

lassik commented 4 years ago

Please "enjoy" my review: https://misc.lassi.io/2019/package-lint-licenses/licenses-good-gpl.el

(https://www.reddit.com/r/UnnecessaryQuotes/)

So here's the final list of completely unproblematic licenses:

lassik commented 4 years ago

We could further split that list into niche licenses mainly used by one organization or one software community:

The popular licenses that are left are,

GPL:

LGPL:

Permissive:

Fanael commented 4 years ago

The two-clause BSD license is FSF and OSI approved and GPL compatible, but is not listed, and there are notable Emacs packages licensed under it, for example most of my packages.

lassik commented 4 years ago

@Fanael True: https://github.com/spdx/license-list-data/issues/52 It's just being fixed.

lassik commented 4 years ago

There are probably some non-OSI-approved licenses that have some existing packages under them. I'd image Unlicense and WTFPL fall into this category.

lassik commented 4 years ago

There's some data about GPL compatibility in a "Free Software Foundation API", which looks like it's unofficial but a very useful service anyway. The SPDX team will consider adding similar info to their metadata: https://github.com/spdx/license-list-XML/issues/934

lassik commented 3 years ago

I have some spare energy to tackle this. @purcell What should we do? The license review above still seems OK to me. Is it too much if package-lint expects an ;; SPDX-License-Identifier: comment with an unproblematic license and warns otherwise?

How official is ;; Licen[cs]e: ..., should we detect that one, and what to do about it? At the very least, we could check that if both License and SPDX-License-Identifier are present, they have the same value.

purcell commented 3 years ago

I have some spare energy to tackle this.

Yay, thanks!

@purcell What should we do? The license review above still seems OK to me. Is it too much if package-lint expects an ;; SPDX-License-Identifier: comment with an unproblematic license and warns otherwise?

I'm not sure I understand the question: are you suggesting alternative rules to what I suggested above? That's fine, but maybe we could enumerate a set of rules?

How official is ;; Licen[cs]e: ..., should we detect that one, and what to do about it? At the very least, we could check that if both License and SPDX-License-Identifier are present, they have the same value.

I think that's fairly official - or at least very widely used - but its format is loose. My take: if there's a Licen[cs]e and it is either SPDX-format or otherwise trivially determined to be valid, then that's fine. But probably we should complain if both headers are supplied (even if they match?).

wasamasa commented 3 years ago

I think that's fairly official - or at least very widely used - but its format is loose. My take: if there's a Licen[cs]e and it is either SPDX-format or otherwise trivially determined to be valid, then that's fine. But probably we should complain if both headers are supplied (even if they match?).

If they match (however you'd detect that), it's fine. For me as a package author it makes sense to add the SPDX header in addition to the license blurb. Removing the license blurb is not an option, that would increase overall churn for no good reason and make it less clear what the license terms are.

purcell commented 3 years ago

For me as a package author it makes sense to add the SPDX header in addition to the license blurb.

Yes, fair enough.

Removing the license blurb is not an option

I don't think anyone is suggesting that here.