Open Tasilee opened 6 years ago
@chicoreus added a document on doing this to the BDQ Wiki (https://github.com/tdwg/bdq/wiki/TG2---Proposal-for-identifying-synthetic-data)
I've got a set of test date data I could check in to the tdwg/bdq repository.
Discussed at 5:30 meeting, TDWG 2018
Approach: (1) assemble test data by checking into tdwg/bdq git hub repository. (2) Combine existing test data set set, with data compiled from aggregators that fail one and only one test. (3) Produce two tests sets, first a set of minimal atomic field(s):value(s):response[result,status] values that could be used by implementers to build unit tests, and second an integration test data set consisting of darwin core records and a matching result set to test if a tool produces the expected set of framework results.
To work on this @Tasilee, @tucotuco, @chicoreus, @ArthurChapman
This has now been done with 18 iterations.
We've only run the validation data against implementations of TIME tests, and in doing that we found multiple issues with the test data and with the test specifications as well as the implementation. Can't really close this until we've run the validation data against implementations of SPACE and NAME tests as well.
The test data has now been added to the https://github.com/tdwg/bdq/tree/master/tg2/_review/docs/implementers. As of writing there are only a few outstanding issues left
This is a placeholder for the task of generating the test data that is REQUIRED to validate the CORE test/assertions.
There had been discussions about the use of artificial or real occurrence data records. I believe we have got to the point where we agree that a combination of both would be useful. Why? Real data is handy as a reality check, but we realise that real data may not be ideal for exercising the tests in a systematic, atomistic way. Hence, a hybrid strategy seems likely.
Please add yourself or others. They can opt out as needed.