SPDX identifiers for licenses? #251

stain commented 3 years ago

This thread is trying to gather existing best practice, or for such to be established, and perhaps to hear other views.

license property vs SPDX identifier refers to a CreativeWork or URL and is of course useful particularly on all kinds of beyond documents, e.g. and

It is now common best practice in open source software to [use SPDX ids] for identifying source code's license, you may have come across code comments like:

# SPDX-License-Identifier: GPL-2.0-or-later

But requires a URL or Creative Work - so which one to use? And can we classify these with SPDX identifiers even if a specialized license file (with copyright) is linked to? How do we deal with dual-license?

SPDX intro lists known open source licenses. These are great as you avoid confusions such as "What do you mean 'BSD license', 2-clause, 3-clause or 4-clause?" - the umabigious BSD-3-Clause can be looked up to

SPDX has known licenses expressed as RDF like (simplified):

        a                             spdx:License ;
        rdfs:comment                  "This license was released: June 1991. This license identifier refers to the choice to use code under GPL-2.0-or-later (i.e., GPL-2.0 or some later version), as distinguished from use of code under GPL-2.0-only. The license notice (as seen in the Standard License Header field below) states which of these applies the code in the file. The example in the exhibit to the license shows the license notice for the \"or later\" approach." ;
        rdfs:seeAlso                  "" , "" ;
        spdx:isFsfLibre               "true" ;
        spdx:isOsiApproved            "true" ;
        spdx:licenseId                "GPL-2.0-or-later" ;
        spdx:name                     "GNU General Public License v2.0 or later" ;

(this RDF seems to only exist in GitHub, although some microdata is embedded it gets the subject wrong).

Using SPDX URIs as @id

So the simple approach, shown in schemaorg/schemaorg#1928, is to just use these URIs like directly - @njh in have opted for the https instead of http variant:

  "@context": "",
  "@type": "SoftwareApplication",
  "name": "ArduinoJson",
  "url": "",
  "author": {
    "@type": "Person",
    "name": "Benoit Blanchon"
  "license": ""

Many URIs

Many of the licenses have their own URIs as well, and then the usual http vs https etc, so we could have many potential inconsistencies:

For listing/mapping has a nice list, but it's custom JSON.


The SPDX website is inconsistent with it's own RDF and links to (notice https and html) so I guess many will get the alternative URIs - I have also seen the variant NJH uses as most common, e.g. we refer to it from

SPDX identifiers are also not just identifying a single license, but also expressions covering dual licenses like MIT or Apache-2.0 or exceptions. Some licenses like are templates requiring a copyright year and copyright holder, and so the actual license URL would be a specialized file, say which would then not immediately be recognizable as the BSD 3-Clause license.

Using identifier from CreativeWork

One way around this could be to use on an anonymous or local CreativeWork license resource - of course setting the SPDX expression directly as identifier would be easiest, but a bit too much left as implications:

{ "@id": "workflow.cwl",
  "@type": "SoftwareSourceCode",
  "license": {
      "@id": "",
      "@type": "CreativeWork",
      "name": "CC BY 4.0",
      "description": "Creative Commons Attribution 4.0 International License",
      "identifier": "CC-BY-SA-4.0"

Using PropertyValue to capture SPDX expressions

More explicit using identifiers we can better include SPDX expressions, even if there either is no license file, or it is a local specialization:

{ "@id": "",
  "@type": "SoftwareSourceCode",
  "license": {
      "@type": "CreativeWork",
      "name": "MIT or AGPL 3.0 (or later)",
      "description": "Dual-licensed as MIT or AGPL 3.0",
      "isBasedOn": [
      "identifier": {
          "@type": "PropertyValue",
          "name": "SPDX-License-Identifier",
          "value": "MIT OR AGPL-3.0+",
          "propertyID": ""

We see that the SPDX expression MIT OR AGPL-3.0+ is captured. I threw in for good measure, although this would play double-duty with the SPDX license expression without its flexibility or rigidity.

Here I used as the as it explains well the SPDX expressions, and instead of just SPDX I used SPDX-License-Identifier to match what they recommend for code comments. (not sure if propertyId here should be {@id: instead.)

This is much more precise - but unfortunately becomes a bit too nested/repetitive when applied to the base case of just using style URIs directly:

  "@context": "",
  "@type": "SoftwareApplication",
  "name": "ArduinoJson",
  "license": {
      "@id": "",
      "@type": "CreativeWork",
      "name": "MIT",
      "identifier": {
          "@type": "PropertyValue",
          "name": "SPDX-License-Identifier",
          "value": "MIT",
          "propertyID": ""

Discussion across GitHub

In schemaorg/schemaorg#1928 @njh concludes to use directly as @id

In seek4science/seek#456 we tried to explore this further, as we had initially abused license as a text field with an implied SPDX identifier looked up using JSON - we need to distinguish between "data license" and "software license". It suggests the PropertyValue expanded form shown above. Discussions include @fbacall @stuzart @alaninmcr

In radiantearth/stac-spec#378 @mojodna @gkellogg @m-mohr are using the variant in JSON-LD

In galaxyproject/galaxy#10408 @jmchilton and @nsoranzo are referencing SPDX from Galaxy workflows, unclear which identifier form (custom YAML?)

In earthcubearchitecture-project418/p418Docs#6 we see @mbjones suggest a PropertyValue approach as above, but less verbose with propertyID: SPDX string, as can be either Text or URL.

The Citation File Format (CFF) (custom YAML) use license_url: and license: "MIT" - see for instance citation-file-format/cff-converter-python#25 by @jspaaks and citation-file-format/citation-file-format#105 with @thomaskrause

mbjones commented 3 years ago

In the guidelines for Dataset metadata, we recommend using SPDX URIs from the RDF files:

In CodeMeta, which is a extension for software metadata, we also recommend using SPDX: although the guidelines are not prescriptive.

m-mohr commented 3 years ago

Some quick thoughts:

bact commented 7 months ago

In the guidelines for Dataset metadata, we recommend using SPDX URIs from the RDF files:

In CodeMeta, which is a extension for software metadata, we also recommend using SPDX: codemeta/codemeta#67 although the guidelines are not prescriptive.

Just a note from Codemetapy :

"For schema:license, full SPDX URIs are used where possible."