openaddresses / openaddresses-ops

Issues-only repo for discussion of operational considerations for OA
6 stars 5 forks source link

Proposal: expand supported license data #7

Closed migurski closed 8 years ago

migurski commented 8 years ago

Based on the license principles discussion, I’d like to recommend a formal change to the contribution guidelines and our parsing code to support an optional extended description.

Currently, the license is documented as “a URL or string”, and supports both explicit links and implicit short strings:

"license": "http://geonb.snb.ca/downloads/documents/geonb_license_e.pdf"

Also valid:

"license": "CC-BY-SA"

We should support an additional expanded version of the license data, with true/false flags for license properties such as required attribution or share-alike:

"license": {
    "url": "http://geonb.snb.ca/downloads/documents/geonb_license_e.pdf",
    "attribution-string": "GeoNB – www.snb.ca/geonb",
    "attribution": true,
    "share-alike": false
}

The old forms will still be acceptable. In cases where attribution or share-alike are not explicitly defined, we would assume both are required.

If this proposal were accepted, here are the next steps:

  1. Write and test machine code to support the structure above.
  2. Deploy new machine code.
  3. Update contribution guide to reflect newly-supported structure.
  4. Research licenses of existing sources to determine their properties.
  5. Expand existing sources to use new license structure.
  6. Determine whether to deprecate older URL/string structure.
NelsonMinar commented 8 years ago

I'm in favor of this! I don't feel strongly about the format of the JSON blob, but this looks good.

Is there some commonly accepted definition of what "Attribution" and "Share-Alike" means that we're implying here? Perhaps the Creative Commons definitions? http://creativecommons.org/licenses/

Are there different degrees of Share-Alike? Wondering if there are some Share-Alike licenses that basically mean "your copy of our database has to be shared", but doesn't extend to a more viral "anything you make with this data must also be Share-Alike".

migurski commented 8 years ago

Attribution seems really clear to me.

Share-alike is much more slippery—I’m still not sure if it seems safer to assume yes or no on this one.

migurski commented 8 years ago

Tagging @sbma44 and @iandees for particular input on this.

NelsonMinar commented 8 years ago

Perhaps we need a "license unknown" category in the output files.

iandees commented 8 years ago

I like this idea and the format of the json blob. I'm a tad bit worried about us interpreting licenses and boiling them down into a couple attributes. Maybe if we're clear in docs that this is our interpretation and might not want to be your interpretation?

migurski commented 8 years ago

Yes, I agree with the idea that this is our interpretation.

migurski commented 8 years ago

Would it be fair to say that anything in OA can be used for derived works? That’s really the crux of the SA flag: it governs what you can do with those works, but we assert that anything in OA should be usable for new data products.

NelsonMinar commented 8 years ago

I don't think there's any value in us collecting data that cannot be used at all for derived works. (Is there any?) The challenge is what restrictions the license might place on derived works. Share-Alike provisions require derived works (sometimes?) make the whole derived work share-alike. Non-Commercial provisions forbid commercial use. I think we should include sources with SA or NC provisions but very clearly delimit them.

NelsonMinar commented 8 years ago

Calling out Non-Commercial explicitly, that's also a common license provision in some circles. Do we need it for OA data sources?

migurski commented 8 years ago

Sounds like we might, and it would map cleanly to the three flags in CC licenses. There’s eight possible combinations, but CC documents just six.

migurski commented 8 years ago

…and I see that two of them include No Derivatives, which I think we can exclude. We would have five possible kinds of downloads:

Three without NC:

NelsonMinar commented 8 years ago

Yeah, three flags in the source documents (one for each feature: BY, SA, NC). Then we can present a list of collections however makes sense based on which license features are most common.

migurski commented 8 years ago

What do we think about presenting them as positives in the download descriptions:

  1. Share-alike → Any License Allowed.
  2. Noncommercial → Commercial Use Allowed.
  3. Attribution → [whatever the opposite of attribution would be]
ajturner commented 8 years ago

Big fan of this.

By the way, CreativeCommons Rights Relation & ccREL for w3c & OKFN Open Licenses.

So using CreativeCommons NS would perhaps be:

"license": {
 "cc:permits": ["cc:Reproduction", "cc:Distribution"],
 "cc:prohibits": ["cc:CommercialUse"]
}
migurski commented 8 years ago

I like the “permits” vs. “prohibits” language, that’s great. The cc: namespaces might not be entirely appropriate since we’re not technically dealing with CC, but linking to them for the spirit could be enough.

sbma44 commented 8 years ago

Super-late to this, but will say:

With all that said I think proceeding is great, but we should make the disclaimers totally unavoidable. I'd hate for anyone to think we're taking formal positions on the usability of the data/offering legal advice.

migurski commented 8 years ago

Makes sense, thank you! I’ll move forward, and I’ll make sure that disclaimers are reflected in the download page design.

migurski commented 8 years ago

I’m making a series of changes here that introduce the new dictionary syntax, with backwards-compatible support for simple strings. It’s just URLs and strings so far; nothing about attribution or license properties yet. The new behavior is released in Machine 2.6.0.

migurski commented 8 years ago

Next step: there are a large number of existing sources with attribution tags. That’s the first explicit license flag we should support. I think we can use the presence of the tag as an implicit hint about the required attribution of the source, going back to the sources later on with explicit flags.

migurski commented 8 years ago

Here’s where we are at the moment with license tag documentation, FYI: https://github.com/openaddresses/openaddresses/blob/633cd4c/CONTRIBUTING.md#optional-tags

geobrando commented 8 years ago

A couple things:

Next step: there are a large number of existing sources with attribution tags. That’s the first explicit license flag we should support. I think we can use the presence of the tag as an implicit hint about the required attribution of the source, going back to the sources later on with explicit flags

  1. @migurski : On at least a few occasions I know that I've added and attribution tag as a simple courtesy to the data owners and not because attribution was required under the license terms. I believe others have done the same. I would recommend against blindly converting all existing sources with this tag to the new license tag structure, unless you're OK with this.
  2. Since license text is sometimes included with the source data, shouldn't the license tag structure allow for paths in the data file that would allow machine to extract this from a single download?
migurski commented 8 years ago

@geobrando: I’m treating the attribution tag as an implied requirement only in the absence of other information, and I’m not updating any of the sources to make this explicit. It should affect only collections without a clear flag, and I believe it will be safe. Does that sound okay to you?

Say more about the paths in the data. Are you thinking that we might point to some text file included in a zip archive?

migurski commented 8 years ago

Comments from @NelsonMinar suggest that splitting attribution downloads doesn’t make sense, but that splitting share-alike ones does. I’m going to put https://github.com/openaddresses/machine/issues/236 and https://github.com/openaddresses/machine/pull/248 on ice for a little while, and introduce a share-alike flag first.

migurski commented 8 years ago

In https://github.com/openaddresses/machine/pull/254, missing share-alike license information is assumed to mean false. Is this safe, or should it default to true to be more cautious?

NelsonMinar commented 8 years ago

My gut reaction is to assume false, simply because share-alike is so rare in the world we're dealing in. Right now do we have any sources that require it? Six months ago I bet we were explicitly not including them at all.

Better yet would be to not assume anything, and either reject a source that doesn't specify or else have some lint tool that's reporting sources missing this info.

migurski commented 8 years ago

I’m thinking false as well. There are a few sources that appear to have SA. I’ll merge the machine changes as they are, and get to work on a set of OA changes that will formally document this and modify some sources.

geobrando commented 8 years ago

@migurski My concern was mainly fully deprecating the standalone attribution tag and converting existing sources to license.attribution = true, but in general I worry about using the presence of an attribution name tag to imply that attribution is required. Maybe it's just a matter of the documentation making this clear.

Say more about the paths in the data. Are you thinking that we might point to some text file included in a zip archive?

Yeah. I think I got confused and thought there were plans to extract license text for dissemination using license.url. But that isn't really feasible or typically necessary. But I believe I have seen some sources with terms that state that a copy of the license should be included whenever the data is disseminated. Can't recall where I saw this though.

migurski commented 8 years ago

Hm good point. I’ve augmented some of the data sets with explicit attribution: false where possible, based on common licenses in https://github.com/openaddresses/openaddresses/pull/1408.

Right now, the only place where the attribution requirement appears is the collection license file. Should I maybe have it default to false instead? Is this dangerous? It’s a softer license term than share-alike.

migurski commented 8 years ago

After a few conversations, realizing that SA is the right license requirement to split downloads on. Going to stop openaddresses/machine#236 and openaddresses/machine#248 and create new issues to reflect this.

migurski commented 8 years ago

With the completion of these issues, I’d like to close this ticket:

Some remaining things that can be done separately:

iandees commented 8 years ago

I agree. Thanks for all your work on this, Mike!

migurski commented 8 years ago

:boom: