theodi / shared

Repo that we use for non-repo-specific stories and other shared stuff.
22 stars 1 forks source link

Draft guide on machine-readable rights statements #7

Closed JeniT closed 11 years ago

ldodds commented 11 years ago

This in progress, but working draft of the publisher guide is here:

https://github.com/theodi/open-data-licensing/blob/master/guides/publisher-guide.md

Still need to add the simple XML, JSON examples and notes on including references from API responses.

The re-user guide will give some guidance on how to process the metadata, e.g. to build attribution links, etc.

ldodds commented 11 years ago

I've completed the first draft of the publisher guide, it provides background on the vocabulary and discusses how to use it when publishing Linked Data, with RDFa and how to refer to rights statements from APIs.

@JeniT can you take a look at let me know if you have any feedback?

https://github.com/theodi/open-data-licensing/blob/master/guides/publisher-guide.md

Am going to work on the re-user guide next, which will just be some short notes on how to use/interpret the data. I don't think that is essential to start publicising though, so if you're happy with the guide and the vocab then I can get on that too.

Note: I've not added any explicit mentions to the certificate. I assumed that there will be a separate document that discusses how to use, e.g. DCAT, ODRS, etc to publish dataset data.

ldodds commented 11 years ago

I've completed first draft of the re-users guide:

https://github.com/theodi/open-data-licensing/blob/master/guides/reusers-guide.md

JeniT commented 11 years ago

Two things:

  1. I am confused about the difference between copyrightNotice and attributionText. Is there a need for both, and why? Which should a reuser use where? I think this needs to be clearer.
  2. Could the guides start with the common case of people who are using data formats other than RDF? I suggest starting with RDFa in web pages, then spell out how that looks in RDF and say that RDF apps can just use that. Then I suggest looking at Link headers, and how to embed licence information in JSON APIs. I think we need to show what it looks like in "normal" JSON because web developers don't design around JSON-LD. If we can make it work with conventions in normal JSON plus a JSON-LD context supplied through a Link header, that would be ideal.
ldodds commented 11 years ago

re: copyrightNotice and attributionText.

I think attributionText is pretty clear -- its the text you use to build a link to the data publishers preferred location. Its equivalent to the "attribution name" property in the creative commons vocabulary. However I moved away from calling it attribution name as I don't think its always just a name, e.g. "wikipedia user community", etc.

The creative commons licenses all state that re-users should preserve any copyright notices that a user provides. But there's no vocabulary for providing such a copyright statement -- the closest is the older dc:rights property. So I wanted to have separate properties for the attribution text and the copyright notice, this gives more flexibilty on finding the bits of data required to format an attribution link.

Copyright statements are often added to the attribution statements that publishers are requesting. For example the Land Registry historical price paid data suggests the following:

"Data produced by Land Registry © Crown copyright 2013"

By decomposing that into two properties, attribution text ("Land Registry") and copyright notice ("© Crown copyright 2013") then I can do things like:

I think if we end up with a single property then this encourages data publishers to create longer, unwieldy attribution statements.

I'm aware that existing usage varies widely but my argument is that having separate properties allows us to converge on sensible conventions whilst still encouraging simple attribution links.

I guess we need to balance how much you want to capture currents snippets of attribution text vs encouraging better practices. To use a different Land Registry example, their price paid data requests the follow attribution text:

"This data covers the transactions received at land Registry in the period [first working day of the month] to [last working day of the month]. © Crown copyright 2012."

In my view that's a particular style of dataset citation built up from various data elements (e.g attribution text, dataset coverage, copyright notice)

re: re-structing the guides, I'll look at re-ordering it as you suggest. I'd prefer to keep the Turtle examples in there though. Maybe its just me, but I can't tell whether I've got my RDFa right unless I turn it back into something else.

JeniT commented 11 years ago

It's still not clear to me when, as either a publisher or re-user, I should use attributionText and when copyrightNotice, and when both. I think as a re-user I want to know (a) what I have to put on every page of my application (b) what I have to make available somewhere within the application and (c) what I can (if I choose) use to reference the data that I've incorporated into my application.

I predict that unless the values of the different properties are used in very different contexts when data is reused, people will put copyright notices in attribution text and vice versa. In your example above, you say "© Crown copyright 2013" is the copyright notice and "Land Registry" is attribution text, but in the example in the guide you have "Contains Ordnance Survey data © Crown copyright and database right 2013. Contains Royal Mail data © Royal Mail copyright and database right 2013. Contains National Statistics data © Crown copyright and database right 2013." as a copyright notice and "Ordnance Survey" as attribution text.

I'm not saying it's not a useful distinction to make, just that I think it's currently confusing. If you were to give a really simple rule of thumb (eg the copyright notice contains a copyright symbol and the attribution text does not contain a copyright symbol) then that might help.

ldodds commented 11 years ago

@JeniT I've yet to restructure the technical examples, but I've expanded on the text around copyright notices in both guides, can you let me know if this clarifies things? I've tried to include some guidance on good/bad examples and suggestions on what should be in each property. Similarly I've given some indication to developers how to use the data.

(btw, I've been noodling on a little library, based on data kitten to help build various kinds of attribution links and colophon pages automatically. Was thinking of putting some time into that if we can get the vocabulary agreed).

Continuing work on the document to restructure if, but would appreciate feedback on the copyright related text. Can we also close down #6 if you're happy with the basic vocab?

ldodds commented 11 years ago

I've revised the guides to:

I think this covers all of the essential elements and like the vocabulary I think this is ready for a wider review.

@JeniT are you happy for me to close down this and the related vocab issue? I'd like to collect further feedback as issues on the open-data-licensing project.

Once the blog post I drafted is published I'll circulate pointers to various groups and will request feedback.

It's possible that more could be done around providing formatting guidance and tooling for creating attribution links, but this could follow on later after getting feedback. That might be a good time to consider whether we want to also encourage structured markup for attribution links, so they can be harvested.

JeniT commented 11 years ago

@ldodds we're out of office at the moment so I can't take a look right now, but I have published the blog post, so please circulate for feedback.