tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-AMENDMENT_BASISOFRECORD_STANDARDIZED #63

Open iDigBioBot opened 6 years ago

iDigBioBot commented 6 years ago
TestField Value
GUID 07c28ace-561a-476e-a9b9-3d5ad6e35933
Label AMENDMENT_BASISOFRECORD_STANDARDIZED
Description Proposes an amendment to the value of dwc:basisOfRecord using the bdq:sourceAuthority.
TestType Amendment
Darwin Core Class Record-level
Information Elements ActedUpon dwc:basisOfRecord
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:basisOfRecord is bdq:Empty; AMENDED the value of dwc:basisOfRecord if it could be unambiguously interpreted as a value in the bdq:sourceAuthority; otherwise NOT_AMENDED
Data Quality Dimension Conformance
Term-Actions BASISOFRECORD_STANDARDIZED
Parameter(s) dwc:basisOfRecord vocabulary
Source Authority bdq:sourceAuthority default = "Darwin Core basisOfRecord" {[https://dwc.tdwg.org/terms/#dwc:basisOfRecord]} {dwc:basisOfRecord vocabulary [https://rs.gbif.org/vocabulary/dwc/basis_of_record.xml]}
Specification Last Updated 2024-07-24
Examples [dwc:basisOfRecord="Human obs": Response.status=AMENDED, Response.result=dwc:basisOfRecord="HumanObservation", Response.comment="dwc:basisOfRecord contains interpretable value"]
[dwc:basisOfRecord="FossilSpecimen": Response.status=NOT_AMENDED, Response.result="", Response.comment="dwc:basisOfRecord contains match in the bdq:sourceAuthority so NOT_AMENDED"]
Source VertNet
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes The term dwc:basisOfRecord has the comment "Recommended best practice is to use a controlled vocabulary such as the set of local names of the identifiers for classes in Darwin Core." The list of these values can be determined by searching https://github.com/tdwg/dwc/blob/master/vocabulary/term_versions.csv for rows with status="recommended" and rdf_type="http://www.w3.org/2000/01/rdf-schema#Class". For example, the term http://rs.tdwg.org/dwc/terms/PreservedSpecimen has a local name PreservedSpecimen. For tests against a dwc:Occurrence record, the set of valid terms is more limited and embodied in the resource found at https://rs.gbif.org/vocabulary/dwc/basis_of_record.xml, which contains the local name for the identifier, as well as preferred and alternate labels from which to standardize values.
iDigBioBot commented 6 years ago

Comment by Arthur Chapman (@ArthurChapman) migrated from spreadsheet: Should follow on after Line 57

ArthurChapman commented 5 years ago

What is the case if an Institution has all its collection as one type of "dwc:basisOfRecord" (like everything is a "FossilSpecimen"). Is there a case then that if the filed is EMPTY it can be populated from the source authority that might just have one value for that institution that is "FossilSpecimen"? Thus we would lkeave EMPTY out of INTERNAL_PREREQUISITES_NOT_MET

tucotuco commented 5 years ago

I would be a hard-ass. If every row is of the same type, it is trivial to provide the value. This is a record-level test, and we can not rely on metadata to get the information.

ArthurChapman commented 5 years ago

I wan't thinking of using metadata, but looking at an example where an institution was running the tests and could set there Parameter as just being one value. Otherwise why is it Parameterized? But I am happy either way.

tucotuco commented 5 years ago

It is currently parametrized to provide a source authority against which to check.

Tasilee commented 5 years ago

We have two levels related to 'source authority' - the authority itself (Parameter required) and the terms it contains (VOCABULARY)?

Except for #75, all tests that have 'VOCABULARY', also have 'Parameterized' VOCABULARY is either Darwin Core - that I'd call internal as the tests have this as a foundation, or an external authority. Maybe we, like the full specifications of the Expected responses for annotations even if they have a corresponding validation, need to be explicit. That is we need to specify Darwin Core as the source authority where relevant?

Am I rambling? It wouldn't be the first time.

tucotuco commented 5 years ago

I do not see that issue #75 is or ever was parametrized.

Yes, the tests are designed to be used against concepts that match the definitions of the Darwin Core terms they reference, and so we should not have Darwin Core as an authority in any of our extant tests. However, "vocabularies of values" designed for use with Darwin Core (or indeed recommended to be used from the Darwin Core side) are not Darwin Core. I would say that these authorities always should be parametrized to decouple the tests from content that is much more mutable over time than the definitions of the Darwin Core terms.

Tasilee commented 1 year ago

Changed Source Authority from

bdq:sourceAuthority default = "Darwin Core Terms" [https://dwc.tdwg.org/terms/#dwc:basisOfRecord]

to

bdq:sourceAuthority default = {Darwin Core} {Basis of record [https://dwc.tdwg.org/terms/#dwc:basisOfRecord] }

and removed bdq:sourceAuthority from Parameters (I presume, as there is no alternative vocab)?

Tasilee commented 1 year ago

Amended Source Authority values to align with @chicoreus syntax

bdq:sourceAuthority default = {Darwin Core} {Basis of record [https://dwc.tdwg.org/terms/#dwc:basisOfRecord]}

to

bdq:sourceAuthority default = "Darwin Core dwc:basisOfRecord" {[https://dwc.tdwg.org/terms/#dwc:basisOfRecord]}

Tasilee commented 1 year ago

Post Zoom 11/7/2023, I have aligned the Source Authority with the suggested syntax:

bdq:sourceAuthority default = "Darwin Core dwc:basisOfRecord" {[https://dwc.tdwg.org/terms/#dwc:basisOfRecord]}

to

bdq:sourceAuthority default = "Darwin Core" {https://dwc.tdwg.org/} {dwc:basisOfRecord [https://dwc.tdwg.org/terms/#dwc:basisOfRecord]}

Tasilee commented 1 year ago

Due to recent discussions, changed Source Authority from

bdq:sourceAuthority default = "Darwin Core" {[https://dwc.tdwg.org/]} {dwc:basisOfRecord [https://dwc.tdwg.org/terms/#dwc:basisOfRecord]}

to

bdq:sourceAuthority default = "Darwin Core basisOfRecord" {[https://dwc.tdwg.org/terms/#dwc:basisOfRecord]} {Basis of record vocabulary [https://rs.gbif.org/vocabulary/dwc/basis_of_record.xml]}

Notes by @tucotuco required

Tasilee commented 1 year ago

I missed the Parameter(s) (added) and the syntax on the vocabulary in Source Authority (done)

tucotuco commented 1 year ago

Updated comment from blank to

"The term dwc:basisOfRecord has the comment "Recommended best practice is to use the standard label of one of the Darwin Core classes." The list of these values can be determined by searching https://github.com/tdwg/dwc/blob/master/vocabulary/term_versions.csv for rows with status="recommended" and rdf_type="http://www.w3.org/2000/01/rdf-schema#Class". For tests against a dwc:Occurrence record, the set of valid terms is more limited and embodied in the resource found at https://rs.gbif.org/vocabulary/dwc/basis_of_record.xml, which contains both preferred labels and alternate labels from which to standardize values. This test will fail if there is leading or trailing whitespace or there are leading or trailing non-printing characters."

Tasilee commented 11 months ago

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted".

Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated"

chicoreus commented 6 months ago

Updated note to remove evident copy/paste error of fail on whitespace text. Leading or trailing whitespace is one condition this amendment should be able to propose a correction for.

chicoreus commented 1 month ago

Note that the labels contain spaces, e.g. "Preserved Specimen", not "PreservedSpecimen".

Updating the examples from:

[dwc:basisOfRecord="Human obs": Response.status=AMENDED, Response.result=dwc:basisOfRecord="HumanObservation", Response.comment="dwc:basisOfRecord contains interpretable value"]

[dwc:basisOfRecord="FossilSpecimen": Response.status=NOT_AMENDED, Response.result="", Response.comment="dwc:basisOfRecord contains match in bdq:sourceAuthority so NOT_AMENDED"]

to

[dwc:basisOfRecord="Human obs": Response.status=AMENDED, Response.result=dwc:basisOfRecord="Human Observation", Response.comment="dwc:basisOfRecord contains interpretable value"]

[dwc:basisOfRecord="Fossil Specimen": Response.status=NOT_AMENDED, Response.result="", Response.comment="dwc:basisOfRecord contains match in bdq:sourceAuthority so NOT_AMENDED"]

Validation data for dataID rows 438, 439, 440, 441, 442, 443, 444, 445, and 446 need to be examined, and at least 443-446 need to be corrected to reflect spaces in the labels.

chicoreus commented 1 month ago

Added an example to the note.

Needs Work label currently applies to the validation data rather than the specification.

tucotuco commented 1 month ago

I don't agree with this one. The term names are the standard (HumanObservation), not their labels. From https://dwc.tdwg.org/terms/#dwc:basisOfRecord: "Recommended best practice is to use a controlled vocabulary such as the set of local names of the identifiers for classes in Darwin Core." Examples: HumanObservation

chicoreus commented 1 month ago

@tucotuco Good. I like local names better. Feels like it fits better with more people's practices. Looks like the Darwin Core term recommendation for best practice has changed. On July 16, 2023, you had added the note with the text: "The term dwc:basisOfRecord has the comment "Recommended best practice is to use the standard label of one of the Darwin Core classes."

I'd be very in favor of changing the test note and examples and keeping the validation data with the local names (without spaces).

chicoreus commented 1 month ago

Updated comment and examples accordingly.

Tasilee commented 1 month ago

I have changed the relevant Test Data records and added a new one. Is NEEDS WORK still needed on this?