rd-alliance / FAIR-data-maturity-model-WG

https://www.rd-alliance.org/group/fair-data-maturity-model-wg/case-statement/fair-data-maturity-model-wg-case-statement
13 stars 3 forks source link

Indicators for R1.1: (meta)data are released with a clear and accessible data usage licence #27

Closed makxdekkers closed 5 years ago

makxdekkers commented 5 years ago

image

makxdekkers commented 5 years ago

Points raised in online meeting 3 on 18 June 2019

keithjeffery commented 5 years ago

I suggest this is insufficient. A licence is most likely human readable and not machine understandable. For autonomic interoprability it is necessary to extract from the licence assertions or rules in logic that can be used to determine whether at this time from this place this software acting on behalf of this user from this organisation in this role can access (and conditionally perform other operations on) the asset and if so what is recorded about the access (e.g. citation, accreditation, audit logging, provenance, curation).

makxdekkers commented 5 years ago

@keithjeffery These are indeed important aspects of a licence. However, I think we also need to be realistic. I would say that possibly very few existing licences in use for research data provide this level of detail. There is some of this in ccRel and in ODRL but I don't know how widespread these are. Are there detailed descriptions of permissions and obligations for commonly used licences (e.g. CC-BY) publicly available?

Would including these very detailed requirements in an indicator not make it too difficult for any data to be considered FAIR? I guess we don't want to end up in a situation that someone says "my data is CC-BY" and the evaluation concludes that the data, because of that, is not FAIR. It would make FAIRness hard to achieve.

Could we enumerate a smaller set of crucial licence information, plus useful, but not mandatory extensions?

keithjeffery commented 5 years ago

Makx – You suggest

Could we enumerate a smaller set of crucial licence information, plus useful, but not mandatory extensions? I think this is the optimal approach Best Keith


Keith G Jeffery Consultants Prof Keith G Jeffery E: keith.jeffery@keithgjefferyconsultants.co.ukmailto:keith.jeffery@keithgjefferyconsultants.co.uk T: +44 7768 446088 S: keithgjeffery

The contents of this email are sent in confidence for the use of the intended recipient only. If you are not one of the intended recipients do not take action on it or show it to anyone else, but return this email to the sender and delete your copy of it.

From: makxdekkers notifications@github.com Sent: 01 July 2019 17:01 To: RDA-FAIR/FAIR-data-maturity-model-WG FAIR-data-maturity-model-WG@noreply.github.com Cc: Keith Jeffery Keith.Jeffery@keithgjefferyconsultants.co.uk; Mention mention@noreply.github.com Subject: Re: [RDA-FAIR/FAIR-data-maturity-model-WG] Indicators for R1.1: (meta)data are released with a clear and accessible data usage licence (#27)

@keithjefferyhttps://github.com/keithjeffery These are indeed important aspects of a licence. However, I think we also need to be realistic. I would say that possibly very few existing licences in use for research data provide this level of detail. There is some of this in ccRel and in ODRL but I don't know how widespread these are. Are there detailed descriptions of permissions and obligations for commonly used licences (e.g. CC-BY) publicly available?

Would including these very detailed requirements in an indicator not make it too difficult for any data to be considered FAIR? I guess we don't want to end up in a situation that someone says "my data is CC-BY" and the evaluation concludes that the data, because of that, is not FAIR. It would make FAIRness hard to achieve.

Could we enumerate a smaller set of crucial licence information, plus useful, but not mandatory extensions?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/27?email_source=notifications&email_token=ADALU5Y2EYOPKBX76R3YMATP5IS4JA5CNFSM4H25663KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY6S5KI#issuecomment-507326121, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADALU56ISTZO2PIAMJUP5KLP5IS4JANCNFSM4H25663A.

makxdekkers commented 5 years ago

@keithjeffery OK, let's then try to get some more opinions and suggestions from others in the WG.

micheldumontier commented 5 years ago

you might be interested in the indicators outlined here: http://reusabledata.org/

micheldumontier commented 5 years ago

as a general comment, we expect that maximum FAIRness is achieved when machines can interpret the terms and conditions in a license (see also smart contract)

makxdekkers commented 5 years ago

@micheldumontier Do you think it is possible to require existing licences (e.g. CC-BY) to be fully machine-understandable, e.g. with explicit permissions, prohibitions, obligations etc., the way ODRL does? If so, could the third indicator above be reformulated as:

Machine-understandable licence

micheldumontier commented 5 years ago

@makxdekkers I think that this is highly desirable, yes.

markwilkinson commented 5 years ago

I would make this a tiered-metric (as we have done for several of the Maturity Indicators in our project). The idea of having a "license - weak compliance" metric, that says "can a machine FIND the license, regardless of what it's value is"; and then a second metric "license - strong compliance", that says "can a machine process the license that it finds". At least for the moment, being able to FIND the license in most metadata records is already a struggle, because existing standards are not well-harmonized around this property. Looking at the LOV registry, I see ~13 distinct predicates that could be interpreted as pointing at some kind of license. It would be nice if that could be pruned down to a small handful. Then, to look at machine-readability of the value of those predicates (where even "readability" is tricky - a lot of the CC licenses have an RDF presentation, that reflects the common structural components of a license, but the content is not processable by machines...)

makxdekkers commented 5 years ago

@markwilkinson The indicators in the first comment in this issue already have the presence of a licence as an indicator. The additional one that I proposed after @micheldumontier' comment is about the machine-understandability of the licence in terms of permissions, obligations etc. Do you agree that those two (presence of licence and machine-understandable licence) coveR the tiered metric you describe? Or would you want to include a recommendation of the specific metadata element where the licence information is to be provided?

markwilkinson commented 5 years ago

Yes, I want a specific metadata element. Without that, I can't find the "thing" that is supposed to represent the license. (unless we go full-on OWL, and then I can check the rdf:type of every value of every metadata element to figure out which one is of type "License" ;-) )

makxdekkers commented 5 years ago

@markwilkinson But how can a specific metadata element be mandated? Where licence information is provided depends on the community standard, doesn't it? Community standards usually have a clear place to provide a link to a licence (DC/DCAT has dct:license, schema.org has schema:license, DataCite metadata schema has a Rights element etc.). Requiring the use of a specific metadata element might then be in conflict with the community standard.

markwilkinson commented 5 years ago

Oh! No, I meant exactly that - that there should BE such an element, formally designated as such by that community, rather than just a randomly coined predicate.

makxdekkers commented 5 years ago

@markwilkinson That's a relief! So an indicator could be:

R1.1-05 Provision of licence information in the appropriate element in the metadata standard used

On the other hand, isn't that a quality issue?

In a way, if metadata does not use the appropriate metadata element, it would also fail on R1.3 as it would not (correctly) follow the relevant community standard.

keithjeffery commented 5 years ago

@Makx - This 1.1-05 is fine as one criterion. Now we 'have it in the appropriate element' (so it can be queried accurately) the next step is to ensure the element is machine-understandable (formal syntax, dclared semantics) i.e. machine readable ==> machine understandable best Keith

makxdekkers commented 5 years ago

@keithjeffery Are you proposing an additional indicator? E.g.:

R1.1-06 Provision of machine-understandable licence information

keithjeffery commented 5 years ago

Makx_ Exactly – you caught my intention Best Keith


Keith G Jeffery Consultants Prof Keith G Jeffery E: keith.jeffery@keithgjefferyconsultants.co.ukmailto:keith.jeffery@keithgjefferyconsultants.co.uk T: +44 7768 446088 S: keithgjeffery

The contents of this email are sent in confidence for the use of the intended recipient only. If you are not one of the intended recipients do not take action on it or show it to anyone else, but return this email to the sender and delete your copy of it.

From: makxdekkers notifications@github.com Sent: 11 July 2019 10:37 To: RDA-FAIR/FAIR-data-maturity-model-WG FAIR-data-maturity-model-WG@noreply.github.com Cc: Keith Jeffery Keith.Jeffery@keithgjefferyconsultants.co.uk; Mention mention@noreply.github.com Subject: Re: [RDA-FAIR/FAIR-data-maturity-model-WG] Indicators for R1.1: (meta)data are released with a clear and accessible data usage licence (#27)

@keithjefferyhttps://github.com/keithjeffery Are you proposing an additional indicator? E.g.:

R1.1-06 Provision of machine-understandable licence information

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/27?email_source=notifications&email_token=ADALU54HQ2RTJARXY2UO3G3P635KDA5CNFSM4H25663KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZWEGAI#issuecomment-510411521, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADALU52MFV4S3RTPTQCWTILP635KDANCNFSM4H25663A.

bahimc commented 5 years ago

Please find the current version of the indicator(s) and their respective maturity levels for this FAIR principle. Indicators and maturity levels will be presented, as they stand, to the next working group meeting for approval. In the meantime, any comments are still welcomed.

The editorial team will now concentrate on weighing and prioritizing these indicators. More information soon.

image

image

bahimc commented 5 years ago

Dear contributors,

Below you can find the indicators and their maturity levels in their current state as a result of the above discussions and workshops.

image image

Please note that this thread is going to be closed, within a short period of time. The current state of the indicators, as of early October 2019, is now frozen, with the exception of the indicators for the principles that are concerned with ‘richness’ of metadata (F2 and R1). The current indicators will be used for the further steps of this WG, which are prioritisation and scoring. Later on, they will be used in a testing phase where owners of evaluation approaches are going to be invited to compare their approaches (questionnaires, tools) against the indicators. The editorial team, in consultation with the Working Group, will define the best approach to test the indicators and evaluate their soundness. As such, the current set of indicators can be seen as an ‘alpha version’. In the first half of 2020, the indicators may be revised and improved, based on the results of the testing. If you have any further comments, suggestions regarding that specific discussion, please share them with us. Besides, we invite you to have a look at the following two sets of issues.

Prioritisation

• Indicators prioritisation for Findability • Indicators prioritisation for Accessibility • Indicators prioritisation for Interoperability • Indicators prioritisation for Reusability

Scoring

• Indicators for FAIRness | Scoring We thank you for your valuable input!