tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2 - Time to develop test data for validating the CORE tests #151

Open Tasilee opened 6 years ago

Tasilee commented 6 years ago

This is a placeholder for the task of generating the test data that is REQUIRED to validate the CORE test/assertions.

There had been discussions about the use of artificial or real occurrence data records. I believe we have got to the point where we agree that a combination of both would be useful. Why? Real data is handy as a reality check, but we realise that real data may not be ideal for exercising the tests in a systematic, atomistic way. Hence, a hybrid strategy seems likely.

Please add yourself or others. They can opt out as needed.

ArthurChapman commented 6 years ago

@chicoreus added a document on doing this to the BDQ Wiki (https://github.com/tdwg/bdq/wiki/TG2---Proposal-for-identifying-synthetic-data)

chicoreus commented 6 years ago

I've got a set of test date data I could check in to the tdwg/bdq repository.

chicoreus commented 6 years ago

Discussed at 5:30 meeting, TDWG 2018

Approach: (1) assemble test data by checking into tdwg/bdq git hub repository. (2) Combine existing test data set set, with data compiled from aggregators that fail one and only one test. (3) Produce two tests sets, first a set of minimal atomic field(s):value(s):response[result,status] values that could be used by implementers to build unit tests, and second an integration test data set consisting of darwin core records and a matching result set to test if a tool produces the expected set of framework results.

To work on this @Tasilee, @tucotuco, @chicoreus, @ArthurChapman

Tasilee commented 2 years ago

This has now been done with 18 iterations.

chicoreus commented 2 years ago

We've only run the validation data against implementations of TIME tests, and in doing that we found multiple issues with the test data and with the test specifications as well as the implementation. Can't really close this until we've run the validation data against implementations of SPACE and NAME tests as well.

ArthurChapman commented 1 month ago

The test data has now been added to the https://github.com/tdwg/bdq/tree/master/tg2/_review/docs/implementers. As of writing there are only a few outstanding issues left