technomancy / leiningen

Moved to Codeberg; this is a convenience mirror
https://codeberg.org/leiningen/leiningen
Other
7.29k stars 1.61k forks source link

Support for SPDX license identifiers and expressions #2249

Open pmonks opened 7 years ago

pmonks commented 7 years ago

It would be pretty awesome if Leiningen was to embrace the Linux Foundation's SPDX collaborative project for everything related to licensing.

There are many possibilities here, but at a minimum being able to declare a Clojure project's license(s) using SPDX license expressions would be a major win for downstream consumers. SPDX license identifiers alone provide much needed canonical forms that uniquely identify most open source licenses, and license expressions build upon the identifiers to support the weird and wonderful world of dual / alternate / "or any later version" licensing.

This might be as simple as updating Leiningen (and associated tooling e.g. lein-licenses) to be sensitive to a :spdx-license-identifier element (perhaps as a new top-level key, or perhaps in the :license / :licenses maps) in a project.clj, that would contain an SPDX license expression (the naming of the key is a little wonky, but that's a historical oddity in SPDX).

While it's possible to jam keys such as this into a project.clj already, unless the tooling is sensitive to it, it doesn't provide much value downstream.

winks commented 7 years ago

So if I understood this correctly, at minimum:

  1. add :spdx-license-identifier to :license in project.clj for leiningen itself
  2. add :spdx-license-identifier to :license/:licenses in project.clj for the template that's being used for a new project.clj
    1. document this new key/value pair
  3. make lein-licenses aware of this.

and according to SPDX putting that into every source file is also encouraged, but as the template is quite minimal I'm not sure this is a good idea.

pmonks commented 7 years ago

Yep I think that's about the sum of it, and I suspect the bulk of the work will be to make lein-licenses aware of the new property. Where I think this might get a bit tricky is in the (inevitable) case of having a dependency tree with a mixture of SPDX and non-SPDX license declarations e.g. should they be normalised to SPDX, or not, or ...? 🤔

I'd argue that that's a separate enhancement to lein-licenses though, rather than a necessary question to be resolved as part of this issue.

And while I've never consumed a project.clj with dual licensing either, apparently such things do exist. SPDX license expressions would have made this case pretty straightforward, as they're still just a single string literal in project.clj: :spdx-license-identifier "(EPL-1.0 AND Apache-2.0)".

I suspect lein-licenses wouldn't need to do much more than regurgitate the expression string literal verbatim, regardless of whether it's an expression or merely a single identifier. For my use case, the output of lein-licenses is parsed by a human being rather than any further tooling (although SPDX opens up the possibility to make that more automated, post-lein-licenses).

pmonks commented 7 years ago

and according to SPDX putting that into every source file is also encouraged, but as the template is quite minimal I'm not sure this is a good idea.

Yeah I'm not proposing that Leiningen or lein-licenses tries to inject or derive SPDX identifiers or expressions from individual source files. Just that it support (in project.clj) what SPDX calls a "Package License Declared" (i.e. a single license expression for the entire project).

By way of background, SPDX appears to have grown up in the C / Linux Kernel community, where developers apparently copy source files into their own code from time to time, resulting in a mishmash of licenses in the same source code project. That's pretty unusual in "modern" ecosystems such as Clojure, where generally speaking if you're dependent on someone else's code, you'll express that as a library dependency in your build and therefore never have to touch or look at their source files. And typically such libraries are licensed under a single license.

Thankfully we probably don't have to worry too much about the nasty case of "copy & paste" reuse. 😉

technomancy commented 7 years ago

I've never seen a project.clj with dual licensing, so that on the one hand that might me tricky to implement in :license (if nobody did it) or trivial as it would be just adding the key/values here.

We already support a vector of :licenses instead of :license if you have more than one.

I am quite skeptical that library authors will ever fill out additional metadata for something like this as they are not the ones who benefit from it--the only reason :license gets filled out at all is 0) it's part of lein new output and 1) lein deploy yells at you otherwise.

But it can't hurt.

The original intent of :license, BTW, is to include it inside the pom.xml file, which is the primary descriptor of the library from a distribution perspective since it's a first-class artifact when you deploy to a remote repository. If we allow richer descriptions of licenses, it would be ideal if both representations of the license could reflect it, but that might not be feasible depending on the schema for pom files.

pmonks commented 7 years ago

We already support a vector of :licenses instead of :license if you have more than one.

Sure, though that isn't sufficient to unambiguously express dual licensing (AND), alternative licensing (OR), "or any later version" licensing (+), or combinations of these things (all of which SPDX license expressions can express).

I am quite skeptical that library authors will ever fill out additional metadata for something like this as they are not the ones who benefit from it--the only reason :license gets filled out at all is 0) it's part of lein new output and 1) lein deploy yells at you otherwise.

Agreed. Look how long it took GitHub to even support optional license information in the repositories they host, for example. #sad

But it can't hurt.

Agreed. It also opens the door for clever plugins (or enhancements to lein-licenses) to calculate this information and inject it back into project.clj (or submit a PR upstream, if the operator is not the author). SPDX provides some guidance for doing this kind of thing, and there are several tools that partially do this already.

The original intent of :license, BTW, is to include it inside the pom.xml file, which is the primary descriptor of the library from a distribution perspective since it's a first-class artifact when you deploy to a remote repository. If we allow richer descriptions of licenses, it would be ideal if both representations of the license could reflect it, but that might not be feasible depending on the schema for pom files.

Yeah. I guess one fallback would be to stick an XML comment in the pom.xml containing the SPDX license expression, but that's pretty horrendous.

There's also some signs of progress around SPDX support in Maven, and it might be possible to hitch a wagon onto that train, if they get so far as to start pushing for enhancements to the POM?

Ultimately I think that's a battle that can be fought down the line - just having the Clojure ecosystem embrace SPDX would be a win, even if the underlying Java ecosystem is still stuck in the 1990s.

(yes I was a Maven user from v1.0, and yes I am still getting expensive therapy as a result!) 😜

pmonks commented 7 years ago

In thinking a little more about this, it may make more sense for :spdx-license-identifer to be a top-level key, rather than going in either :license or :licenses. It clearly doesn't make much sense inside :licenses (since SPDX expressions natively support lists of licenses), and could even be considered an alternative to :license.

I don't have a strong opinion either way (I'd need to think more about it), but something to ponder on...