tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-VALIDATION_ESTABLISHMENTMEANS_STANDARD #268

Open ArthurChapman opened 7 months ago

ArthurChapman commented 7 months ago
TestField Value
GUID 4eb48fdf-7299-4d63-9d08-246902e2857f
Label VALIDATION_ESTABLISHMENTMEANS_STANDARD
Description Does the value of dwc:establishmentMeans occur in the bdq:sourceAuthority?
TestType Validation
Darwin Core Class dwc:Occurrence
Information Elements ActedUpon dwc:establishmentMeans
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:establishmentMeans is bdq:Empty; COMPLIANT if the value of dwc:establishmentMeans is in the bdq:sourceAuthority; otherwise NOT_COMPLIANT.
Data Quality Dimension Conformance
Term-Actions ESTABLISHMENTMEANS_STANDARD
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "Establishment Means Controlled Vocabulary List of Terms" {[https://dwc.tdwg.org/em/]} {GBIF vocabulary API [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts]}
Specification Last Updated 2024-02-08
Examples [dwc:establishmentMeans="native": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:establishmentMeans found in the bdq:sourceAuthority"]
[dwc:establishmentMeans="cultivated": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:establishmentMeans not found in the bdq:sourceAuthority"]
Source TG2
References
  • Darwin Core Maintenance Group (2021) Establishment Means Controlled Vocabulary List of Terms. Biodiversity Information Standards (TDWG). http://rs.tdwg.org/dwc/doc/em/
  • Groom et al. (2019) Improving Darwin Core for research and management of alien species. Biodiversity Information Science and Services 3: e38084. https://doi.org/10.3897/biss.3.38084
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes This test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters.
ArthurChapman commented 7 months ago

I am wondering if this should now be CORE. It had a high rating, but wasn't made CORE at the time as a suitable Vocabulary didn't exist. One now does exist so _ this could be made CORE

tucotuco commented 7 months ago

I agree that it should be CORE now that it has a vocabulary. But then by rights, there should be similar tests for the two other terms that have formal vocabularies, dwc:degreeOfEstablishment, and dwc:pathway. These terms came into existence after we started the BDQ work.

On Wed, Feb 7, 2024 at 1:16 PM Arthur Chapman @.***> wrote:

I am wondering if this should now be CORE. It had a high rating, but wasn't made CORE at the time as a suitable Vocabulary didn't exist. One now does exist so _ this could be made CORE

— Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/268#issuecomment-1932929859, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ727N25DW7I3T3TRH7QLYSPVJ7AVCNFSM6AAAAABC6QZRLKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSHEZDSOBVHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

chicoreus commented 7 months ago

I would also concur. More notes in https://github.com/tdwg/bdq/issues/269#issuecomment-1933287001

Tasilee commented 7 months ago

Changed Source Authority from

bdq:sourceAuthority default = "Establishment Means Controlled Vocabulary List of Terms" {[https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts]} {dwc:establishmentMeans [https://dwc.tdwg.org/list/#dwc_establishmentMeans]}

to

bdq:sourceAuthority default = "Darwin Core establishmentMeans" {[https://dwc.tdwg.org/list/#dwc_establishmentMeans]} {dwc:establishmentMeans vocabulary [https://dwc.tdwg.org/em/]}

to align with agreed structure.

Tasilee commented 7 months ago

Arthur and I have had a discussion about the finer details of the phrasing and links for Source Authority. I am suggesting that we use the current rendition or a variant as a template we apply to all tests. I based this form on #104 as we are dealing with a Darwin Core term. There are up to three potential links

  1. The Darwin Core term and definition
  2. A list of values
  3. An API of values

In this example, we use (1) and (3).

The phrasing of #104 and here (1) spells out "Darwin Core" then the term "establishmentMeans" in camelCase. Is this appropriate?

The phrasing of (3) uses "dwc:establishmentMeans" which seems ok, but are we happy with something like "vocabulary of terms API" for the following text?

We need to be consistent across all tests.

ArthurChapman commented 6 months ago

The four tests (#277, #278, #268, #269) should be CORE (I have discussed this with Lee). Some reasons are

chicoreus commented 6 months ago

@ArthurChapman @Tasilee See comment in 152 We need to sort out the conflated concepts within CORE. This set of issues does not fit into CORE the UseCase identified by TG3, but it does fit in another UseCase we consider central and fit into CORE in the meaning of the suite of tests we want to include in the Standard. This test is not Supplementary. But it is not CORE as we use it as the UseCase.

ArthurChapman commented 6 months ago

@chicoreus - see my separate email. CORE, as in the CORE tests, has never been restricted to TG3 and trying to do so, complicates the process. There should be no difference. If you look through the "Source" of the tests - most came from somewhere other than TG3.

chicoreus commented 6 months ago

@ArthurChapman Yes, CORE has meant the outcome of TG3, that was one of our key guidelines of what to include in CORE or not. We've been able to get away with conflating the what taxon where when for research analysis sense of CORE with a sense of CORE as a broader set of tests we are putting forward as part of the standard until now because the only scope we've been dealing with is that of the outcome of TG3. Source of the tests is not relevant, CORE has been our filter on those sources.

" There should be no difference." means that we are still conflating two very distinct concepts. See the comment on #152.

What is making the difference now is this set of tests that we think are very important, but don't fit into the data quality needs of CORE, they fit other use cases, but not that one. We need to clearly define the relevant UseCases sensu the framework, and clarify what we mean by CORE.

ArthurChapman commented 6 months ago

@chicoreus. TG3 was never one of our key guidelines on determining CORE. Its results were not out until well after we defined what we meant by CORE and had started developing the tests, and TG3 did not cover all aspects. It was looking at a methodology, but was never the guiding principle for TG2. Aspirationally, the TG3 methodology is good methodology for determining Use Cases, but it is not robust as yet and this was a Case Study - not a definitive study. It looked at how you would do it and the User Stories were examples - they were never meant to be comprehensive. In fact, the lack of responses to many of the questionnaires excluded it from being comprehensive. It was looking at a process - User Story, Use Case, linking it then to the Framework, etc. and running a proof of concept. If you needed you could write a use story for establishmentMeans if that satisfied you - not hard to do!

Incidentally, I just went through our tests - 105 tests, including 78 of our Current Core tests, were written prior to TG3 finalising its User Stories. Most were based on existing tests at ALA, iDigBio, CRIA, BISON etc. and were not related with the TG3 User Stories, although there was obviously some overlap.

chicoreus commented 6 months ago

@ArthurChapman No, TG3 was exactly the thing that shaped CORE. All of our thinking about which tests to include in CORE and what the tests do is shaped by the CORE UseCase of research analysis of darwin core occurrence data of which taxa occur where when. It is implicit in all of our analysis of both which tests to include and what the tests do. Only now that we are starting to describe tests that we think are important but fall outside the scope of CORE are we seeing that we need to clarify what we mean by CORE, either the use case, or the set of tests we are recommending, in which case we need to provide another name for the use case and specify what the other use case is.

Tasilee commented 6 months ago

Either way, we need to be happy with our definition of CORE, and I'd strongly suggest we include links to our not CORE tags to be clear on what is not CORE!

I can't say TG3's use cases were in the front of my mind when considering new tests. They formed a reference but can never be comprehensive in scope given unknown unknows :)

chicoreus commented 6 months ago

We should be phrasing the source authority as:

bdq:sourceAuthority default = "Establishment Means Controlled Vocabulary List of Terms" {[https://dwc.tdwg.org/em/]} {GBIF vocabulary API [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/]}

As pointed out by @ManonGros in https://github.com/tdwg/bdq/issues/283#issuecomment-1961476932 the GBIF vocabulary API is documented at https://techdocs.gbif.org/en/openapi/v1/vocabulary#/ Developers can choose the best means to to access the API, which for small vocabularies may be caching the json export of the vocabulary. https://api.gbif.org/v1/vocabularies/EstablishmentMeans/export For VALIDATION_term_STANDARD tests, the GBIF API is only likely to provide alternate access to the TDWG controlled vocabulary, but for AMENDMENT_term_STANDARDIZED, it looks like the GBIF data will be including a larger set of translations of labels than the actual standard document, which should be helpful in standardization implementations.

chicoreus commented 6 months ago

Updated notes from "fail" to more specific "This test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters."

Tasilee commented 6 months ago

Thanks @chicoreus - changing Source Authority from

bdq:sourceAuthority default = "Darwin Core establishmentMeans" {[https://dwc.tdwg.org/list/#dwc_establishmentMeans]} {dwc:establishmentMeans vocabulary API [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts]}

to

bdq:sourceAuthority default = "Establishment Means Controlled Vocabulary List of Terms" {[https://dwc.tdwg.org/em/]} {GBIF vocabulary API [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/]}

tucotuco commented 5 months ago

I think https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts is OK as a source authority because it actually has an API, as long as it is understood that the actually vocabulary is maintained at https://dwc.tdwg.org/em/ and the GBIF API is expected to remain up to date with that.

Tasilee commented 4 months ago

Changed Source Authority from

bdq:sourceAuthority default = "Establishment Means Controlled Vocabulary List of Terms" {[https://dwc.tdwg.org/em/]} {GBIF vocabulary API [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts]}

to

bdq:sourceAuthority default = "GBIF EstablishmentMeans Vocabulary" [https://api.gbif.org/v1/vocabularies/EstablishmentMeans]} {"dwc:establishmentMeans vocabulary API" [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts]}

tucotuco commented 4 months ago

Source Authority should be

bdq:sourceAuthority default = "Establishment Means Controlled Vocabulary List of Terms" {[https://dwc.tdwg.org/em/]} {GBIF vocabulary API [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts]}

Tasilee commented 4 months ago

Changed Source Authority from

bdq:sourceAuthority default = "GBIF EstablishmentMeans Vocabulary" [https://api.gbif.org/v1/vocabularies/EstablishmentMeans]} {"dwc:establishmentMeans vocabulary API" [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts]}

to

bdq:sourceAuthority default = "Establishment Means Controlled Vocabulary List of Terms" {[https://dwc.tdwg.org/em/]} {GBIF vocabulary API [https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts]}

chicoreus commented 4 months ago

See https://github.com/tdwg/bdq/issues/275#issuecomment-2061845648

The GBIF API does not help here, it does not provide the actual Controlled Values from the TDWG vocabulary, the values it has differ in case.

Tasilee commented 4 months ago

GBIF vocabulary has now been aligned with Darwin Core. Thanks @timrobertson100