tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-VALIDATION_LICENSE_STANDARD #38

Open iDigBioBot opened 6 years ago

iDigBioBot commented 6 years ago
TestField Value
GUID 3136236e-04b6-49ea-8b34-a65f25e3aba1
Label VALIDATION_LICENSE_STANDARD
Description Does the value of dcterms:license occur in bdq:sourceAuthority?
TestType Validation
Darwin Core Class Record-level
Information Elements ActedUpon dcterms:license
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dcterms:license is bdq:Empty; COMPLIANT if the value of the term dcterms:license is in the bdq:sourceAuthority; otherwise NOT_COMPLIANT
Data Quality Dimension Conformance
Term-Actions LICENSE_STANDARD
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "Creative Commons 4.0 Licenses or CC0 {[https://creativecommons.org/]} { Regular Expression [((http(s){0,1}://creativecommons.org/licenses/(by|by-sa|by-nc|by-nc-sa|by-nd|by-nc-nd)/4.0/((deed|legalcode)(.(id|eu|da|de|en|es|fr|fy|hr|it|lv|lt|mi|ni|no|pl|pt|ro|si|fi|sv|tr|cs|el|ru|uk|ar|jp|zh-hans|zh-hant|ko)){0,1})|(http(s){0,1}://creativecommons.org/publicdomain/zero/1.0/((deed|legalcode)(.(id|eu|da|de|en|es|fr|fy|hr|it|lv|lt|ni|no|pl|pt|ro|si|fi|sv|tr|cs|el|ru|uk|ar|jp|zh-hans|zh-hant|ko)){0,1})))$
Specification Last Updated 2023-09-17
Examples [dcterms:license="https://creativecommons.org/licenses/by/4.0/": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dcterms:license matches a term in the bdq:sourceAuthority"]
[dcterms:license="GPL": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dcterms:license does not match a term in the bdq:sourceAuthority"]
Source John Wieczorek
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes The license at the record level might be derived from the license of the data set from which the record is retrieved. This test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters. The canonical form of the Creative Commons license IRI has nothing after the version e.g. https://creativecommons.org/licenses/by/4.0/, but may be followed by deed or legalcode e.g. https://creativecommons.org/licenses/by/4.0/deed and this may be followed by a language code. However, only some two letter language codes have translations, and some translations are identified by a longer string than the two letter language code. Errors in the language code, or specifying a language code for which a translation doesn't exist returns a 404 error instead of redirecting to the more general license IRI. As of 2024-02-28 deed.mi doesn't exist yet, but legalcode.mi does.
iDigBioBot commented 6 years ago

Comment by Christian Gendreau (@cgendreau) migrated from spreadsheet: I would say more "parsable" than valid since the validity depends on the context

chicoreus commented 6 years ago

Correcting namespace for license in information element s/dc/dcterms/

This may require renaming the test. See the Darwin Core RDF guide for discussion of use of dcterms:license for non-literals (IRIs to resources) and xmpRights:usageTerms for literals.

See also #99 and #133 which may also need renaming.

ArthurChapman commented 6 years ago

Corrected dc:license to dc:terms:license throughout

ArthurChapman commented 6 years ago

@chicoreus Perhaps we could just change names of the tests to ...LICENSE... rather then ...DCLICENSE...

Whatever we do, it will be synonymised in Vocabulary.

chicoreus commented 6 years ago

@ArthurChapman names to LICENSE makes sense to me.

ArthurChapman commented 6 years ago

Closed by mistake

Tasilee commented 2 years ago

Added to Notes: "This test will fail if there are leading or trailing white space or non-printing characters."

Tasilee commented 12 months ago

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted"

chicoreus commented 6 months ago

Updated notes from "fail" to more specific "This test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters. "

chicoreus commented 6 months ago

Per both the current examples in Darwin Core for dcterms:license and section 3.3. of the Darwin Core RDF guide, dcterms:license should only take IRI values such as https://creativecommons.org/licenses/by-sa/4.0/ , not string literals such as CC BY-SA.

See: https://dwc.tdwg.org/rdf/#33-imported-dublin-core-terms-that-have-non-literal-objects-and-corresponding-terms-that-have-literal-objects-normative

The compliant example in this test incorrectly uses string literal values.

The source authority specified only allows for the version 4 CC licenses. This may be desirable, but, that may be too narrow a scope. It might be desirable to specify the full list of CC licences as the source authority at: https://creativecommons.org/licenses/list.en If not, we should be explicit about why the limitation to the 4.0 versions.

We should also be explicit about whether forms other than the canonical IRI are acceptable. Creative Commons specifies that the form https://creativecommons.org/licenses/by-sa/4.0/ as canonical, with additional variants including https://creativecommons.org/licenses/by-sa/4.0/legalcode and translations such as https://creativecommons.org/licenses/by-sa/4.0/legalcode.en and plain text https://creativecommons.org/licenses/by-sa/4.0/legalcode.txt and RDF forms https://creativecommons.org/licenses/by-sa/4.0/rdf

The examples in Darwin Core include a non-canonical form ending with legalcode.

Translations do not exist in all languages, for example, https://creativecommons.org/licenses/by-sa/4.0/legalcode.cy currently returns a 404 error, not the Welsh translation, or a redirect up to /legalcode for that license.

The specified source authority doesn't provide a list of all variants of the IRIs, so it would either need to change to point to the list of licenses https://creativecommons.org/licenses/list.en (where there are links to each extant translation, but mo links to the canonical IRIs) or implementations would need to determine how to handle evaluating whether a specified IRI for a translation is compliant or not by both checking the pattern http[s]{0,1}://creativecommons.org/licenses/(by|by-nc|by-nc-nd|by-nc-sa |by-nd|by-sa)/4.0/(legal-code(.[a-z]{2}){0.1} followed by a lookup to see if the requested IRI returns a 404 error or not.

In either case, specifying the source authority isn't sufficient information to determine if the if the value of the term dcterms:license is in the bdq:sourceAuthority;

chicoreus commented 6 months ago

Propose we change the source authority to:

bdq:sourceAuthority default = "Creative Commons 4.0 Licenses or CC0 " {[https://creativecommons.org/]} { Regular Expression [ (http(s){0,1}://creativecommons.org/licenses/(by|by-sa|by-nc|by-nc-sa|by-nd|by-nc-nd)/4.0/((deed|legalcode)(.(id|eu|da|de|en|es|fr|fy|hr|it|lv|lt|mi|ni|no|pl|pt|ro|si|fi|sv|tr|cs|el|ru|uk|ar|jp|zh-hans|zh-hant|ko)){0,1})|(http(s){0,1}://creativecommons.org/publicdomain/zero/1.0/((deed|legalcode)(.(id|eu|da|de|en|es|fr|fy|hr|it|lv|lt|ni|no|pl|pt|ro|si|fi|sv|tr|cs|el|ru|uk|ar|jp|zh-hans|zh-hant|ko)){0,1}) ]}

Haven't confirmed that this regex syntax is correct, but should be close, also need to doublecheck language list for public domain dedication. Will need to be more complex to be robust, as deed.mi doesn't exist yet, but legalcode.mi does.

All of deed, legalcode, no ending are valid, with the canonical form of the license IRI having nothing after the version number.

Only some two letter language codes have translations, and some translations are identified by a longer string than the two letter language code. Errors in the language code, or specifying a language code for which a translation doesn't exist returns a 404 error instead of redirecting to the more general license IRI.

chicoreus commented 6 months ago

Syntax corrected regex:

^(http(s){0,1}:\/\/creativecommons.org\/licenses\/(by|by-sa|by-nc|by-nc-sa|by-nd|by-nc-nd)\/4.0\/((deed|legalcode)(.(id|eu|da|de|en|es|fr|fy|hr|it|lv|lt|mi|ni|no|pl|pt|ro|si|fi|sv|tr|cs|el|ru|uk|ar|jp|zh-hans|zh-hant|ko)){0,1})|(http(s){0,1}:\/\/creativecommons.org\/publicdomain\/zero\/1.0\/((deed|legalcode)(.(id|eu|da|de|en|es|fr|fy|hr|it|lv|lt|ni|no|pl|pt|ro|si|fi|sv|tr|cs|el|ru|uk|ar|jp|zh-hans|zh-hant|ko)){0,1})))$

ArthurChapman commented 6 months ago

That looks like a solution!

ArthurChapman commented 6 months ago

Your other explanation would be good to put in the Notes.

chicoreus commented 6 months ago

Regex still not quite right. This one tested and works:

^(http(s){0,1}://creativecommons[.]org/licenses/(by|by-sa|by-nc|by-nc-sa|by-nd|by-nc-nd)/4[.]0/((deed|legalcode)(.){0,1}){0,1})|(http(s){0,1}://creativecommons[.]org/publicdomain/zero/1[.]0/((deed|legalcode)(.){0,1}){0,1})$

chicoreus commented 6 months ago

@ArthurChapman added the substance of the comment above to the notes.

Tasilee commented 5 months ago

Thanks @ArthurChapman. @chicoreus : How can you render the "|"s in the regular expression into a form acceptable to the github table format? I've tried to add the Source Authority as specified.

ArthurChapman commented 5 months ago

To include a pipe in Markdown text -use "\|"

Tasilee commented 5 months ago

Thanks @ArthurChapman - done I hope.