tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2 - Centralized Sandbox for Test suite implementation suggestion #154

Open pbrenton opened 6 years ago

pbrenton commented 6 years ago

Proposal: I'd suggest that GBIF develop a common DwC test suite with REST APIs against which anyone in the world can bounce their data and get appropriate results.

Benefits: This approach would:

  1. allow the governance process around the test suite to be implemented at the same time and consistently for all consumers of it.
  2. optimise the cost effectiveness of implementation for members in respect to having to create their own local implementations.
  3. ensure that tests are implemented consistently for everyone and that they are consistently repeatable.
MattBlissett commented 6 years ago

Do you mean a service to run the data quality tests against a DwC archive?

We already have the GBIF Data Validator (source code) which has a REST API (dataset-based) and runs a dataset through the same interpretation / validation / quality checks as data going into GBIF. So, as we implement the data quality checks, they will be available in this validator as well as the GBIF data indexing system.

(As the page says, this is an early implementation of the validator, so the API may still change as required.)

chicoreus commented 6 years ago

Duplicate of #151 per @Tasilee

chicoreus commented 6 years ago

Discussed at 5:30 meeting TDWG 2018, reopening as separate issue. This issue covers a webservice to do the tests. #151 covers test data to make the webservice (and other implementations) consistent in output. @tucotuco notes that this is Kurator with a webservice over it.