tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-AMENDMENT_[]_STANDARDIZED source authorities #185

Closed tucotuco closed 4 years ago

tucotuco commented 4 years ago

While reviewing Issue #60 on standardizing geodeticDatum I discovered that we have at least three other AMENDMENT tests that refer to a controlled vocabulary for the source authority and not to the thesaurus that will be needed to resolve them to the controlled vocabulary. The three issues are  Issue #41 (AMENDMENT_DCTYPE_STANDARDIZED) Issue #48 (AMENDMENT_COUNTRYCODE_STANDARDIZED) and Issue #133 (AMENDMENT_LICENSE_STANDARDIZED). There may be others. The source authority for each of these will be the GBIF thesauri, but we don't have a way to refer to them yet. In Issue #60 I used the pattern "[bdq:sourceAuthority = GBIF [unqualified term name] thesaurus, when available]" in the Notes. We should reach a consensus and do the same for all affected tests.

Tasilee commented 4 years ago

I totally agree @tucotuco. It had taken me a while to understand we needed vocabs for validations and thesauri for a subset of the 28 amendments.

The template "bdq:sourceAuthority = GBIF [unqualified term name] thesaurus, when available" seems appropriate, if GBIF are committed. Given they have the most extensive record base on which to base look-ups/translations, this seems logical.

ArthurChapman commented 4 years ago

Sounds like a good plan @tucotuco. We are depending on the development and availability of the GBIF thesauri, but it is an advance on what we have now. It would be good to have @timrobertson100 or GBIF make available some documented plans on the extent and timetable for their proposed thesauri.

Tasilee commented 4 years ago

In looking through the amendments, we use phrases inconsistently when referring to how the amendments are made: "changed to"; "converted from"; "interpreted from"; "determined from"; "added from"; "FILLED_IN from"; "standardized using"; "populated from". These do however provide some classification of amendment types.

In some cases, we reject anything but exact matches. In other cases we presume code will enable conversions. In other cases, we presume a thesaurus of some type. We need to standardize the phrasing so that it is clear what type of operation is being used.

The amendments that IMPLY some form of conversion/translation are #60, #41, #48, #133, #163, #115, #63, #45, #26, #127, #43(?), #52, #61, #86, #128, #55, #68, #32

ArthurChapman commented 4 years ago

I generally have no problem with most of these:

  1. 60, #41, #48, #133, #163, #115, #63, [all standardized from] OK

  2. 26, #127,#52, #61, #86, #128, #68, [date related ones that mention unambiguously interpreted ... We had lots of dicsussion in Gainsville about unabiguously interpreted with these date terms so I see no problem] OK

  3. #55, [I would change the wording to "were unambiguously interpreted from" rather than "were unambiguously determined from" to conform with #68 et al.]
  4. 43, [This is a special case and the only "Conversion" type] OK

  5. #128 [could possibly alter to say "unambiguously interpreted as" to conform with #68 et al.] OK
  6. 32 [Again a special case - could possibly change "were populated from" to "changed to unambiguously conform to" ].

So I only see #55, #32 and possibly #128 as worth changing.

Tasilee commented 4 years ago

These are the change components of the amendments

could be unambiguously interpreted from the value provided in dwc has been standardized has been standardized using the bdq have been filled in from a valid unambiguously interpretable value in dwc have been unambiguously interpreted given the specified source authority service to the Parameter value if was added from a successful lookup of dwc was altered to unambiguously conform with the ISO was altered to unambiguously conform with the ISO was changed to comply with standard values from bdq was changed to unambiguously conform with an ISO was FILLED_IN from any of the fields dwc was FILLED_IN from the values in dwc was interpretable to be a integer was interpreted from the values in dwc was set to the predefined default value was standardized to conform with DCMI was standardized using a specified source authority service was standardized using the specified source authority service was standardized using the specified source authority service was unambiguously inferred from supplied dwc was unambiguously interpreted from dwc was unambiguously interpreted to be an integer were changed based on a conversion between spatial reference systems were populated from information in verbatim coordinate information were unambiguously determined from dwc were unambiguously interpreted from dwc

As you can see, there are some inconsistencies. I would also question the use of "FILLED_IN" rather than "filled in" or "populated from".

Lee

On Tue, Apr 14, 2020 at 9:00 AM Arthur Chapman notifications@github.com wrote:

I generally have no problem with most of these:

  1. 60 https://github.com/tdwg/bdq/issues/60, #41

    https://github.com/tdwg/bdq/issues/41, #48 https://github.com/tdwg/bdq/issues/48, #133 https://github.com/tdwg/bdq/issues/133, #163 https://github.com/tdwg/bdq/issues/163, #115 https://github.com/tdwg/bdq/issues/115, #63 https://github.com/tdwg/bdq/issues/63, [all standardized from] OK

  2. 26 https://github.com/tdwg/bdq/issues/26, #127

    https://github.com/tdwg/bdq/issues/127,#52 https://github.com/tdwg/bdq/issues/52, #61 https://github.com/tdwg/bdq/issues/61, #86 https://github.com/tdwg/bdq/issues/86, #128 https://github.com/tdwg/bdq/issues/128, #68 https://github.com/tdwg/bdq/issues/68, [date related ones that mention unambiguously interpreted ... We had lots of dicsussion in Gainsville about unabiguously interpreted with these date terms so I see no problem] OK

  3. #55 https://github.com/tdwg/bdq/issues/55, [I would change the wording to "were unambiguously interpreted from" rather than "were unambiguously determined from" to conform with #68 https://github.com/tdwg/bdq/issues/68 et al.]
  4. 43 https://github.com/tdwg/bdq/issues/43, [This is a special

    case and the only "Conversion" type] OK

  5. #128 https://github.com/tdwg/bdq/issues/128 [could possibly alter to say "unambiguously interpreted as" to conform with #68 https://github.com/tdwg/bdq/issues/68 et al.] OK
  6. 32 https://github.com/tdwg/bdq/issues/32 [Again a special case -

    could possibly change "were populated from" to "changed to unambiguously conform to" ].

So I only see #55 https://github.com/tdwg/bdq/issues/55, #32 https://github.com/tdwg/bdq/issues/32 and possibly #128 https://github.com/tdwg/bdq/issues/128 as worth changing.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/185#issuecomment-613137663, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSZCXX7V2MMZTKFPRQU34TRMOKPXANCNFSM4MHGTIPA .

--

[image: untitled]

Lee Belbin

Science Adviser Atlas of Living Australia National Research Collections Australia, National Collections & Marine Infrastructure, CSIRO | Clunies Ross Street, Acton ACT 2601 | GPO Box 1700, Canberra ACT 2601

Phone: +61 (0)419 374 133 | lee.belbin@csiro.au leebelbin@gmail.com | www.ala.org.au | http://www.csiro.au/en/Research/Collections

chicoreus commented 4 years ago

Note, FILLED_IN is a special case (and should be defined in the vocabulary, and used whenever applicable), it indicates that a term contained no value, and that some value has been proposed. This is a different case from that where a change was proposed (for which we should also have a consistent vocabulary term). See the RDF....

ArthurChapman commented 4 years ago

Has now been added to Vocabulary #152

FILLED_IN An Amendment (q.v.) that indicates that a term contained no value, and that some value has been proposed.
chicoreus commented 4 years ago

That should be a ResultState (or status, forget which we are using) produced by an Amendment... It isn't an amendment, but a result status that can be produced by one.

See ffdq-api:

public static ResultState RUN_HAS_RESULT = new ResultState("HAS_RESULT");
public static ResultState NOT_RUN = new ResultState("NOT_RUN");
public static ResultState INTERNAL_PREREQUISITES_NOT_MET = new ResultState("DATA_PREREQUISITES_NOT_MET");
public static ResultState EXTERNAL_PREREQUISITES_NOT_MET = new ResultState("EXTERNAL_PREREQUISITES_NOT_MET");

public static ResultState CHANGED = new ResultState("CHANGED");
public static ResultState FILLED_IN = new ResultState("FILLED_IN");
public static ResultState NO_CHANGE = new ResultState("NO_CHANGE");

And values we aren't using:

public static ResultState TRANSPOSED = new ResultState("TRANSPOSED");
public static ResultState AMBIGUOUS = new ResultState("AMBIGUOUS");
Tasilee commented 4 years ago

..."default value" ?

Also then, some of the other variants should then be aligned.

"filled in" (but from dwc...) this would also apply to ""were populated from ..." and "added from ..." "was set to the predefined default value"

But there are still plenty of inconsistencies even so-

"standardized using" "standardized" without how it was done "have been" vs "was/were" "altered" vs "changed" "using a specified source authority" vs "using the specified source authority" "determined from.." vs "interpreted from ..." vs "inferred from..."

ArthurChapman commented 4 years ago

@chicoreus Thanks - we appear to be using "Response"

Tasilee commented 4 years ago

We have a quorum to CLOSE.