radiantearth / stac-spec

SpatioTemporal Asset Catalog specification - making geospatial assets openly searchable and crawlable
https://stacspec.org
Apache License 2.0
789 stars 179 forks source link

License field #133

Closed m-mohr closed 6 years ago

m-mohr commented 6 years ago

I am just comparing standards and how they handle certain aspects, e.g. the license field. When looking into STAC I found that there are several inconsistencies in the documentation.

In the JSON spec it says:

Item's license name based on SPDX License List or following guidelines for non-SPDX licenses

extensions/transaction/transaction-fragment.yaml / json-spec/json-schema/stac-item.json / api-spec/STAC-fragment.yaml define the files as follows:

Data license name based on SPDX License List

The JSON spec does not mention what the guidelines for non-SPDX licenses are and looking through the SPDX web page I couldn't really find an answer. Additionally, the other mentioned schemas in the repository don't mention anything other than SPDX licenses. So what to do for proprietary licenses?

The catalog (static-catalog/json-schema/catalog.json) even specifies a new JSON object for licenses, which can be used instead of the the SPDX licenses. It doesn't look to be well thought out though, e.g. shortname of "format": "email" sounds confusing:

    "license": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "shortname": {
          "type": "string",
          "format": "email"
        },
        "copyright": {
          "type": "string"
        },
        "link": {
          "type": "string",
          "format": "uri"
        }
      },
      "required": ["name", "link"]
    },

Another question regarding the license in the catalog: Is it the data license or the catalog metadata license?

One of the examples uses it completely different: json-spec/examples/digitalglobe-sample.json contains (C) COPYRIGHT 2016 DigitalGlobe, Inc., Longmont CO USA 80503.

That should be sorted out and be made uniform across the spec.

Suggestion / Alternative 1: I'd suggest to change the license definition and make it an object.

Fields:

Example for SPDX licenses:

{
  "spdx": "Apache-2.0"
}

Or a longer form, replicating information from the SPDX list:

{
  "name": "Apache License 2.0",
  "spdx": "Apache-2.0",
  "url": "http://www.apache.org/licenses/LICENSE-2.0"
}

Example for non-SPDX licenses:

{
  "name": "CeCILL-B Free Software License",
  "url": "http://www.cecill.info/licences/Licence_CeCILL-B_V1-en.html"
}

One thing that bothers me here is that the url (or should it be link?) could also be in the links section with the rel type "license", but that would spread information about licensing in two different places.

Alternative 2: If we don't want to change the current specification much, but want to be flexible: Allow only SPDX license identifiers in the license field and require users to add a link with rel license to the links for other licenses.

What do other specifications/standards do?

m-mohr commented 6 years ago

Another interesting idea: What about allowing an array of licenses to be specified? I just saw the Planet Disaster Data Catalog and it includes:

Imagery is provided under Creative Commons licenses, free of charge, with either CC-BY-SA or CC-BY-NC.

For dual-licensed data, there is no good way to describe them in STAC, but it could be an array:

{
  "name": "Planet Disaster Data",
  "license": [
    "CC-BY-SA",
    "CC-BY-NC"
  ],
  ...
}
mojodna commented 6 years ago

SPDX supports multiple licenses via AND and/or OR, so perhaps that's sufficient?

m-mohr commented 6 years ago

This was discussed in STAC spring #3.

We concluded to allow SPDX licenses in the license (or license_name?) field and add an additional license_url field. license_url should be set only when license is set to "proprietary". I think license should just be a valid SPDX expression to allow multiple licenses. I am not sure how this incorparates with our custom term "proprietary" though. SPDX uses the strange name noassertion for that behaviour.

Some interesting SPDX tools I just found:

These tools are probably mostly interesting for clients, e.g. STAC browser.

Others handling that issue:

PR still to be done by @jeffnaus or me.

matthewhanson commented 6 years ago

So now I'm wondering if we need license_url at all, and instead can just include license as a link with rel="license"

cholmes commented 6 years ago

I think I like link with rel=license. Especially if we do have a license name as a property.

And yes, I think we start with 'proprietary' for non-SPDX license. I don't think most imagery is 'unlicensed', it's in non-free license. We might choose a subset of SPDX - just the Creative Commons, odbl and public access ones, since software licenses aren't really applicable.

We could also say 'custom' and require that the link to the license is there if you use 'custom'. Eventually may be good to say some 'types' of proprietary licenses, like 'derivatives allowed'. But defining that list is far out of the scope of STAC (I've wanted to work on it before, since it is really needed)

m-mohr commented 6 years ago

I do see advantages and disadvantages in the rel=license thing:

+1 on proprietary

+1 on SPDX licenses, but definitely -1 on subset of SPDX licenses within STAC. This complicates things and we need special treatment in the spec, which should be avoided to keep things simple. Licenses not applicable will just not be used anyway. Most times the license is set before making a STAC item so users will not go through the SPDX and choose one, but just look for their pre-defined license identifier. I don't see any benefit on sub-setting.

matthewhanson commented 6 years ago

I think there will be other cases where info is split up so I don't see that as much of a problem.

With Datasets we'd probably want a Dataset field indicating the dataset it belongs to, but then also a link to it, so it's a similar case.

m-mohr commented 6 years ago

Either I don't understand your last paragraph or there are some "dataset" that should be replaced by "license"... I guess you are basically saying: Make Item and Dataset consistent regarding how to handle licenses and their URLs.

cholmes commented 6 years ago

Ah, good point on SPDX licenses. I wasn't thinking of doing a full constraint on them. Maybe just more a 'recommendation' - these are good ones that are relevant to geospatial information.

As for the disadvantage, @mojodna seemed less compelled by arguments that 'it's harder for tooling' (like links vs dicts), since it is just a bit of code that has to be written once.

m-mohr commented 6 years ago

Sure, it is not a big thing. I'll make a PR next week.

m-mohr commented 6 years ago

Is included in the restructuring PR #202.

m-mohr commented 6 years ago

This should be completed with the latest PR.