joesingo / compliance-checker

Python tool to check your datasets vs compliance standards
Apache License 2.0
0 stars 0 forks source link

Support YAML descriptions of check suites #1

Open joesingo opened 6 years ago

joesingo commented 6 years ago

Following Ag's work, change the compliance-checker code to support reading a YAML file that describes a suite of checks.

Basic idea:

Implementation ideas/questions:

@agstephens Have I got the basic idea above correct? Is there anything missing?

joesingo commented 6 years ago

Fairly rough first attempt at this here: c1a923f569141f8ccb818a2bcb6891c960f8f16a. Currently it uses a hardcoded check method that checks if the dataset filename contains a certain string as a substring (could be anything but this is simple enough to test easily).

So far there are much fewer options available in the YAML config compared to Ag's existing work (e.g. https://github.com/cedadev/compliance-check-maker/blob/master/project/ukcp18/file-info.yml)

Usage is as follows:

my-test.yaml

suite_name: "my_test_suite"

checks:
  - check_id:    "first_check"
    check_level: "HIGH"
    params:      {"string": "bad"}

  - check_id: "second_check"
    params: {"string": "good"}

Then run

cchecker.py --yaml-test my-test.yml --test my_test_suite ~/good_file

Output should be something like the following:

2017-12-14T16:19:46.327647 [INFO] :: ESDOC-PYESSV :: Loading vocabularies from /home/ubuntu/.esdoc/pyessv-archive:
Running Compliance Checker on the dataset from: /home/ubuntu/good_file

--------------------------------------------------------------------------------
                      The dataset scored 1 out of 2 points
                         during the my_test_suite check
--------------------------------------------------------------------------------
                               Scoring Breakdown:

                                 High Priority
--------------------------------------------------------------------------------
    Name                            :Priority: Score
check_first_check                       :3:     0/1

                                Medium Priority
--------------------------------------------------------------------------------
    Name                            :Priority: Score
check_second_check                      :2:     1/1

--------------------------------------------------------------------------------
                  Reasoning for the failed tests given below:

Name                             Priority:     Score:Reasoning
--------------------------------------------------------------------------------
check_first_check                      :3:     0/ 1 : String 'bad' was not found
                                                      in filename
                                                      '/home/ubuntu/good_file'

If this is on the right lines, I suppose the next step would be to specify the base check in the YAML instead of using hardcoded check.

agstephens commented 6 years ago

@joesingo: your suggested approach looks good. I suspect that we might get feedback from the IOOS guys later that will require some tweaking - but looks like a sound plan for the prototype.

joesingo commented 6 years ago

As of 4d9ceeaa64e1413bd7ce69d7979b54b7649c1e42 I have removed myBaseParametrisableCheck class and moved in CallableCheckBase and subclasses directly from compliance-check-lib.

YAML config can now look something like:

suite_name: "my_test_suite"

checks:
  - check_id:       "fi01"
    check_name:     "compliance_checker.file_checks.FileSizeCheck"
    modifiers:      {'threshold': 2, 'strictness': 'soft'}
    comments:       "This is an advisory check"

  - check_id:       "fi02"
    check_name:     "compliance_checker.file_checks.FileSizeCheck"
    check_level:    "HIGH"
    modifiers:      {'threshold': 4, 'strictness': 'hard'}
    comments:       "This is an strict check"

  - check_id:       "fi03"
    check_name:     "compliance_checker.file_checks.FileNameStructureCheck"
    check_level:    "HIGH"
    modifiers:      {'delimiter': '_', 'extension': '.nc'}

The check_name field can point to any subclass of CallableCheckBase that is importable when cchecker.py is run

joesingo commented 6 years ago

Notes from discussion with Ag: