Closed migurski closed 8 years ago
I'm in favor of this! I don't feel strongly about the format of the JSON blob, but this looks good.
Is there some commonly accepted definition of what "Attribution" and "Share-Alike" means that we're implying here? Perhaps the Creative Commons definitions? http://creativecommons.org/licenses/
Are there different degrees of Share-Alike? Wondering if there are some Share-Alike licenses that basically mean "your copy of our database has to be shared", but doesn't extend to a more viral "anything you make with this data must also be Share-Alike".
Attribution seems really clear to me.
Share-alike is much more slippery—I’m still not sure if it seems safer to assume yes or no on this one.
Tagging @sbma44 and @iandees for particular input on this.
Perhaps we need a "license unknown" category in the output files.
I like this idea and the format of the json blob. I'm a tad bit worried about us interpreting licenses and boiling them down into a couple attributes. Maybe if we're clear in docs that this is our interpretation and might not want to be your interpretation?
Yes, I agree with the idea that this is our interpretation.
Would it be fair to say that anything in OA can be used for derived works? That’s really the crux of the SA flag: it governs what you can do with those works, but we assert that anything in OA should be usable for new data products.
I don't think there's any value in us collecting data that cannot be used at all for derived works. (Is there any?) The challenge is what restrictions the license might place on derived works. Share-Alike provisions require derived works (sometimes?) make the whole derived work share-alike. Non-Commercial provisions forbid commercial use. I think we should include sources with SA or NC provisions but very clearly delimit them.
Calling out Non-Commercial explicitly, that's also a common license provision in some circles. Do we need it for OA data sources?
Sounds like we might, and it would map cleanly to the three flags in CC licenses. There’s eight possible combinations, but CC documents just six.
…and I see that two of them include No Derivatives, which I think we can exclude. We would have five possible kinds of downloads:
Three without NC:
Yeah, three flags in the source documents (one for each feature: BY, SA, NC). Then we can present a list of collections however makes sense based on which license features are most common.
What do we think about presenting them as positives in the download descriptions:
Big fan of this.
By the way, CreativeCommons Rights Relation & ccREL for w3c & OKFN Open Licenses.
So using CreativeCommons NS would perhaps be:
"license": {
"cc:permits": ["cc:Reproduction", "cc:Distribution"],
"cc:prohibits": ["cc:CommercialUse"]
}
I like the “permits” vs. “prohibits” language, that’s great. The cc:
namespaces might not be entirely appropriate since we’re not technically dealing with CC, but linking to them for the spirit could be enough.
Super-late to this, but will say:
With all that said I think proceeding is great, but we should make the disclaimers totally unavoidable. I'd hate for anyone to think we're taking formal positions on the usability of the data/offering legal advice.
Makes sense, thank you! I’ll move forward, and I’ll make sure that disclaimers are reflected in the download page design.
I’m making a series of changes here that introduce the new dictionary syntax, with backwards-compatible support for simple strings. It’s just URLs and strings so far; nothing about attribution or license properties yet. The new behavior is released in Machine 2.6.0.
Next step: there are a large number of existing sources with attribution
tags. That’s the first explicit license flag we should support. I think we can use the presence of the tag as an implicit hint about the required attribution of the source, going back to the sources later on with explicit flags.
Here’s where we are at the moment with license tag documentation, FYI: https://github.com/openaddresses/openaddresses/blob/633cd4c/CONTRIBUTING.md#optional-tags
A couple things:
Next step: there are a large number of existing sources with attribution tags. That’s the first explicit license flag we should support. I think we can use the presence of the tag as an implicit hint about the required attribution of the source, going back to the sources later on with explicit flags
- @migurski : On at least a few occasions I know that I've added and
attribution
tag as a simple courtesy to the data owners and not because attribution was required under the license terms. I believe others have done the same. I would recommend against blindly converting all existing sources with this tag to the newlicense
tag structure, unless you're OK with this.- Since license text is sometimes included with the source data, shouldn't the
license
tag structure allow for paths in thedata
file that would allow machine to extract this from a single download?
@geobrando: I’m treating the attribution
tag as an implied requirement only in the absence of other information, and I’m not updating any of the sources to make this explicit. It should affect only collections without a clear flag, and I believe it will be safe. Does that sound okay to you?
Say more about the paths in the data. Are you thinking that we might point to some text file included in a zip archive?
Comments from @NelsonMinar suggest that splitting attribution downloads doesn’t make sense, but that splitting share-alike ones does. I’m going to put https://github.com/openaddresses/machine/issues/236 and https://github.com/openaddresses/machine/pull/248 on ice for a little while, and introduce a share-alike flag first.
In https://github.com/openaddresses/machine/pull/254, missing share-alike license information is assumed to mean false
. Is this safe, or should it default to true
to be more cautious?
My gut reaction is to assume false
, simply because share-alike is so rare in the world we're dealing in. Right now do we have any sources that require it? Six months ago I bet we were explicitly not including them at all.
Better yet would be to not assume anything, and either reject a source that doesn't specify or else have some lint tool that's reporting sources missing this info.
I’m thinking false
as well. There are a few sources that appear to have SA. I’ll merge the machine changes as they are, and get to work on a set of OA changes that will formally document this and modify some sources.
@migurski My concern was mainly fully deprecating the standalone attribution
tag and converting existing sources to license.attribution = true
, but in general I worry about using the presence of an attribution name
tag to imply that attribution is required. Maybe it's just a matter of the documentation making this clear.
Say more about the paths in the data. Are you thinking that we might point to some text file included in a zip archive?
Yeah. I think I got confused and thought there were plans to extract license text for dissemination using license.url
. But that isn't really feasible or typically necessary. But I believe I have seen some sources with terms that state that a copy of the license should be included whenever the data is disseminated. Can't recall where I saw this though.
Hm good point. I’ve augmented some of the data sets with explicit attribution: false
where possible, based on common licenses in https://github.com/openaddresses/openaddresses/pull/1408.
Right now, the only place where the attribution requirement appears is the collection license file. Should I maybe have it default to false instead? Is this dangerous? It’s a softer license term than share-alike.
After a few conversations, realizing that SA is the right license requirement to split downloads on. Going to stop openaddresses/machine#236 and openaddresses/machine#248 and create new issues to reflect this.
With the completion of these issues, I’d like to close this ticket:
Some remaining things that can be done separately:
I agree. Thanks for all your work on this, Mike!
:boom:
Based on the license principles discussion, I’d like to recommend a formal change to the contribution guidelines and our parsing code to support an optional extended description.
Currently, the license is documented as “a URL or string”, and supports both explicit links and implicit short strings:
Also valid:
We should support an additional expanded version of the license data, with true/false flags for license properties such as required attribution or share-alike:
The old forms will still be acceptable. In cases where attribution or share-alike are not explicitly defined, we would assume both are required.
If this proposal were accepted, here are the next steps: