psych-ds / psychds-validator

Validator tools for the psych-DS specification
0 stars 1 forks source link

Integrating priority flow into schema model #28

Closed bleonar5 closed 3 months ago

bleonar5 commented 6 months ago

TL;DR

Schema model and validator should include a notion of "priority flow" so that certain errors and warnings only appear if other errors and warnings are not present.

For instance, if there is no dataset_description file present in the dataset, we don't want to inundate the user with all the downstream issues that this will raise (e.g. missing column issues, missing metadata field issues, etc.)

We've discussed systems of varying complexity to address this, but the one that I believe we landed on involved a simple notion of presence/absence for different files. That is, all rules and requirements specified within the schema model should include a field that indicates which other file or directory or correctly structured object that rule is dependent on, so that, if such elements are not present, all such annotated rules and requirements will not raise issues.

Since these issues will still technically be valid/true, it may be advisable to include language in the validator output that indicates that not all errors/issues incurred will necessarily be displayed. Perhaps we could collate these messages with the main absence/presence issues themselves, with something like this:

    [ERROR] It is required to include a 'dataset_description.json' in the base directory (MISSING_DATASET_DESCRIPTION)
                NOTE: the validator detected 7 additional issues that are potentially downstream of this one. Providing a valid dataset_description object may or may not resolve these issues.

It would also be good when implemented this to make the full set of downstream issues visible anyway, either by rolling it into the settings for the --verbose flag, or by creating a new specific flag such as --display-downstream-issues