Closed turbomam closed 1 month ago
As an alternative to maintaining a set of invalid examples over time, maybe this repository could contain only valid examples and then the invalid ones could be generated on-demand from those valid ones + the latest schema + a script that breaks slots according to some rules (e.g. if the schema specifies this slot contain a string, store the number 1
in it; if schema specifies this slot is required, delete it; etc.). In that case, the test procedure would be:
Generating test data programmatically (i.e. what I describing here) does have some "code smell" to me.
Is there a way to get the validator to count the violations in a file?
Here's an open source tool (happens to be web-based) that people can use to generate data that is valid with respect to a given JSON Schema.
https://json-schema-faker.js.org/ (GitHub repo)
This tool generates valid data, but this GitHub Issue is about invalid data.
A tool that generates data that is valid, except for one slot, could be built upon this tool (e.g. do what this tool does, then target a single slot and change its value to be something that the schema says is not allowed there).
I suggest making this part of the PR template (https://github.com/microbiomedata/nmdc-schema/issues/1995)
It's obvious if a valid example file starts to fail, because
make test
won't completeHowever, we don't have any mechanism for checking whether an invalid becomes "more invalid"
Rules 2. and 3. can be checked with
linkml-validate
, like thisOnly one ERROR should be reported, and it should agree with the portion of the filename after the first hyphen