tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-AMENDMENT_TYPESTATUS_STANDARDIZED #286

Open ArthurChapman opened 7 months ago

ArthurChapman commented 7 months ago
TestField Value
GUID b3471c65-b53e-453b-8282-abfa27bf1805
Label AMENDMENT_TYPESTATUS_STANDARDIZED
Description Proposes an amendment to the value of dwc:typeStatus using the bdq:sourceAuthority.
TestType Amendment
Darwin Core Class dwc:Occurrence
Information Elements ActedUpon dwc:typeStatus
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL PREREQUISITES_NOT_MET if dwc:typeStatus is bdq:Empty; AMENDED the value of the first word in each | delimited portion of dwc:typeStatus if it can be unambiguously matched to a term in the bdq:sourceAuthority; otherwise NOT_AMENDED.
Data Quality Dimension Conformance
Term-Actions TYPESTATUS_STANDARDIZED
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "Darwin Core typeStatus" {[https://dwc.tdwg.org/list/#dwc_typeStatus]} {dwc:typeStatus vocabulary API [https://gbif.github.io/parsers/apidocs/org/gbif/api/vocabulary/TypeStatus.html]}
Specification Last Updated 2024-08-16
Examples [dwc:typeStatus="Holo.": Response.status=AMENDED, Response.result=dwc:typeStatus="Holotype", Response.comment="dwc:typeStatus found in the bdq:sourceAuthority"]
[dwc:typeStatus="x": Response.status=NOT_AMENDED, Response.result="", Response.comment="dwc:typeStatus not found in the bdq:sourceAuthority"]
Source TG2
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes Valuable for data quality needs related to voucher specimens in natural science collections. Almost all occurrence data will have no value in dwc:typeStatus. For reference, a vocabulary of synonyms can be found for dwc:typeStatus at [https://registry.gbif.org/vocabulary/TypeStatus/concepts.
ArthurChapman commented 7 months ago

@chicoreus @tucotuco I'm not sure that the link I have for the API is an actual API or if one exists (https://gbif.github.io/parsers/apidocs/org/gbif/api/vocabulary/TypeStatus.htm) thus the NEEDS WORK label

ymgan commented 6 months ago

@CecSve mentioned in #284 that GBIF is working on the typeStatus vocabulary https://github.com/gbif/vocabulary/issues/87 Flagging this here.

ArthurChapman commented 6 months ago

Changed to Immature/Incomplete pending development of Vocabulary by GBIF

tucotuco commented 5 months ago

Changed to Immature/Incomplete pending development of Vocabulary by GBIF

GBIF has a vocabulary, it just isn't accessible via API from the vocabulary server. Implementations don't necessarily need an API to function. In fact, they would be more efficient or much more efficient without API calls, depending on how they were implemented. In other words, I do not think that having API access to a controlled vocabulary is a requirement for implementation, but having a controlled vocabulary is.

ArthurChapman commented 5 months ago

I am happy with that @tucotuco. Any comments @chicoreus?

ArthurChapman commented 5 months ago

Changed to CORE and deleted some wording from Notes. Left as "NEEDS WORK" following discussion with @chicoreus on need for MEASURE test. More discussion needed.

chicoreus commented 1 month ago

Not sure that this is tractable. The expectation for values in dwc:typeStatus is a pipe delimited list of {type status term of taxon name {publication}}. The definition explicitly includes the taxon name as part of the expected value: "A list (concatenated and separated) of nomenclatural types (type status, typified scientific name, publication) applied to the subject."

One example includes citation information, the other just type status term and taxon name.

For just type status terms and taxon names, we could probably manage with two source authorities, one for the type status term and one for the taxon name, but with publication citations included, that will not be tractable.

We might get away with conforming the first word of each pipe delimited block to a type status term vocabulary.

Examples in Darwin Core are:

holotype of Ctenomys sociabilis. Pearson O. P., and M. I. Christie. 1985. Historia Natural, 5(37):388

holotype of Pinus abies | holotype of Picea abies
tucotuco commented 1 month ago

We might also make a change term request for dwc:typeStatus and see if that flies.

ArthurChapman commented 1 month ago

Interesting - perhaps we need to do what @tucotuco suggests. Originally, I thought we were just checking against a list of types of Types regardless of other data such as the taxon and the publication. We generally look at terms in isolation, but I wasn't realising Darwin Core included the taxon name and publication. That certainly makes it a lot more difficult and wonder if it is still worth keeping (as CORE at least - possibly as SUPPLEMENTARY). I believe our original thoughts were to just test to see if the type of type was included in a vocabulary - holotype, neotype, lectotype, etc. (i.e. as in https://rs.gbif.org/vocabulary/gbif/type_status_2021-01-18.xml). My suggestion would be to drop this test as I can't think of another way to word it so it is consistent with Darwin Core - i.e. taking just the first part of the Darwin Core definition ("A list (concatenated and separated) of nomenclatural types (type status") without the second part. Perhaps the suggestion by @tucotuco or a new Darwin Core term - but it is too late for that for us.

ArthurChapman commented 1 month ago

Perhaps do what @tucotuco suggests and in the meantime drop to Incomplete/Immature.

chicoreus commented 1 month ago

Alternative is to split into parts by the pipe character and evaluate the first word of each part.

Perhaps something like:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL PREREQUISITES_NOT_MET if dwc:typeStatus is EMPTY; AMENDED the value of the first word in each | delimited portion of dwc:typeStatus if it can be unambiguously matched to a term in bdq:sourceAuthority; otherwise NOT_AMENDED

tucotuco commented 1 month ago

Also, there is this open issue which we can support. https://github.com/tdwg/dwc/issues/28

ArthurChapman commented 1 month ago

@chicoreus - your suggestion seems reasonable and workable. As discussed under https://github.com/tdwg/dwc/issues/28 a lot of databases have just the type of Type under typeStatus in their databases. I see a good case for us to support the DwC proposal, but in the mean time use the pipe suggestion of @chicoreus

Tasilee commented 1 month ago

With general agreement, I am changing the Expected Response from

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL PREREQUISITES_NOT_MET if dwc:typeStatus is EMPTY; AMENDED the value of dwc:typeStatus if it can be unambiguously matched to a term in bdq:sourceAuthority; otherwise NOT_AMENDED

to

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL PREREQUISITES_NOT_MET if dwc:typeStatus is EMPTY; AMENDED the value of the first word in each | delimited portion of dwc:typeStatus if it can be unambiguously matched to a term in bdq:sourceAuthority; otherwise NOT_AMENDED

and updating Specification Last Updated

ArthurChapman commented 1 month ago

I wonder if it should be "value of the first word in the first | delimited portion" rather than "value of the first word in each | delimited portion"

chicoreus commented 1 month ago

On Sat, 03 Aug 2024 17:07:25 -0700 Arthur Chapman @.***> wrote:

I wonder if it should be "value of the first word in the first | delimited portion" rather than "value of the first word in each | delimited portion"

In each portion, as each portion is expected to be a string in the form {typestatus} of {scientific name} {publicication}.

Some specimens are types for more than one name.

ArthurChapman commented 1 month ago

@chicoreus - how do you see the parsing of this with the pipes (|)?

chicoreus commented 1 month ago

On Sat, 03 Aug 2024 18:22:41 -0700 Arthur Chapman @.***> wrote:

@chicoreus - how do you see the parsing of this with the pipes (|)?

In incomplete pseudocode:

elements = split(typeStatus,'|') for each element in elements { if (first word in element not found in vocabulary) { compliantFlag = false } }

Tasilee commented 3 weeks ago

I needed to add "\" to pipe in the Expected Response for general interpretation