Open joesingo opened 6 years ago
Fairly rough first attempt at this here: c1a923f569141f8ccb818a2bcb6891c960f8f16a. Currently it uses a hardcoded check method that checks if the dataset filename contains a certain string as a substring (could be anything but this is simple enough to test easily).
So far there are much fewer options available in the YAML config compared to Ag's existing work (e.g. https://github.com/cedadev/compliance-check-maker/blob/master/project/ukcp18/file-info.yml)
Usage is as follows:
my-test.yaml
suite_name: "my_test_suite"
checks:
- check_id: "first_check"
check_level: "HIGH"
params: {"string": "bad"}
- check_id: "second_check"
params: {"string": "good"}
Then run
cchecker.py --yaml-test my-test.yml --test my_test_suite ~/good_file
Output should be something like the following:
2017-12-14T16:19:46.327647 [INFO] :: ESDOC-PYESSV :: Loading vocabularies from /home/ubuntu/.esdoc/pyessv-archive:
Running Compliance Checker on the dataset from: /home/ubuntu/good_file
--------------------------------------------------------------------------------
The dataset scored 1 out of 2 points
during the my_test_suite check
--------------------------------------------------------------------------------
Scoring Breakdown:
High Priority
--------------------------------------------------------------------------------
Name :Priority: Score
check_first_check :3: 0/1
Medium Priority
--------------------------------------------------------------------------------
Name :Priority: Score
check_second_check :2: 1/1
--------------------------------------------------------------------------------
Reasoning for the failed tests given below:
Name Priority: Score:Reasoning
--------------------------------------------------------------------------------
check_first_check :3: 0/ 1 : String 'bad' was not found
in filename
'/home/ubuntu/good_file'
If this is on the right lines, I suppose the next step would be to specify the base check in the YAML instead of using hardcoded check.
@joesingo: your suggested approach looks good. I suspect that we might get feedback from the IOOS guys later that will require some tweaking - but looks like a sound plan for the prototype.
As of 4d9ceeaa64e1413bd7ce69d7979b54b7649c1e42 I have removed myBaseParametrisableCheck
class and moved in CallableCheckBase
and subclasses directly from compliance-check-lib
.
YAML config can now look something like:
suite_name: "my_test_suite"
checks:
- check_id: "fi01"
check_name: "compliance_checker.file_checks.FileSizeCheck"
modifiers: {'threshold': 2, 'strictness': 'soft'}
comments: "This is an advisory check"
- check_id: "fi02"
check_name: "compliance_checker.file_checks.FileSizeCheck"
check_level: "HIGH"
modifiers: {'threshold': 4, 'strictness': 'hard'}
comments: "This is an strict check"
- check_id: "fi03"
check_name: "compliance_checker.file_checks.FileNameStructureCheck"
check_level: "HIGH"
modifiers: {'delimiter': '_', 'extension': '.nc'}
The check_name
field can point to any subclass of CallableCheckBase
that is importable when cchecker.py
is run
Notes from discussion with Ag:
__call__
to something like do_check
in CallableCheckBase
to make things clearer (and rename, as will no longer be callable)CallableCheckBase
and sub-classes back into compliance-check-lib
, and make compliance-check-lib
a dependency of CCcompliance-check-lib
- if someone wants to create their own checks they will change CC-lib code itself
Following Ag's work, change the
compliance-checker
code to support reading a YAML file that describes a suite of checks.Basic idea:
compliance-checker
reads YAML file and dynamically creates a checker class - each item in the list becomes its own check method that calls the base check with the correct parametersImplementation ideas/questions:
compliance-checker
codebase could contain a classBaseParametrisableCheck
. Users would then create subclasses to initialise required parameters, and implement__call__
method to perform the check on a dataset. The same idea Ag has already implemented at https://github.com/cedadev/compliance-check-lib/blob/master/checklib/register/callable_check_base.py#L6)--yaml-check=<filename>
to add the generated check, and use the normal--test=<name>
with the suite name specified in the YAML.compliance-checker
aware of user's subclasses? Could use same approach as plugins for CC (see documentation here), i.e. users would create a separate python project and add an entry point insetup.py
@agstephens Have I got the basic idea above correct? Is there anything missing?