rapidsai / dependency-file-generator

https://pypi.org/project/rapids-dependency-file-generator/
Apache License 2.0
15 stars 13 forks source link

feat: Validate dependencies.yaml using jsonschema #29

Closed vyasr closed 1 year ago

vyasr commented 1 year ago

This PR enables validating the contents of a dependencies.yaml file directly without doing any processing. The schema is encoded using JSON Schema and validated using the Python implementation. The new Python code is fairly minimal, and it would be even shorter except that I leveraged the object-oriented API to show all errors in a file instead of simply showing the first error using jsonschema.validate. The majority of the new lines are from the schema definition. The validation is injected into the normal CLI usage so that schemas are always validated before dependency files are generated, ensuring that developers see useful errors about why their dependencies.yaml file is invalid rather than opaque runtime errors when dfg fails to use the file.

vyasr commented 1 year ago

As an example, applying this patch:

--- a/tests/examples/no-specific-match/dependencies.yaml
+++ b/tests/examples/no-specific-match/dependencies.yaml
@@ -1,14 +1,11 @@
 files:
   all:
-    output: conda
     requirements_dir: output/actual
     matrix:
       cuda: ["11.8"]
     includes:
       - cudatoolkit
-channels:
-  - rapidsai
-  - conda-forge
+channels: 1234
 dependencies:
   cudatoolkit:
     specific:

and rerunning tests results in

------------------------------------------------------------------------------------------ Captured stderr call -------------------------------------------------------------------------------------------
Error #1:
        'output' is a required property

        Failed validating 'required' in schema['properties']['files']['patternProperties']['.*']:
            {'properties': {'conda_dir': {'type': 'string'},
                            'includes': {'items': {'type': 'string'},
                                         'type': 'array'},
                            'matrix': {'type': 'object'},
                            'output': {'if': {'type': 'array'},
                                       'then': {'items': {'type': 'string'}},
                                       'type': ['string', 'array']},
                            'requirements_dir': {'type': 'string'}},
             'required': ['output', 'includes'],
             'type': 'object'}

        On instance['files']['all']:
            {'includes': ['cudatoolkit'],
             'matrix': {'cuda': ['11.8']},
             'requirements_dir': 'output/actual'}
Error #2:
        1234 is not of type 'array', 'string'

        Failed validating 'type' in schema['properties']['channels']:
            {'if': {'type': 'array'},
             'then': {'items': {'type': 'string'}},
             'type': ['array', 'string']}

        On instance['channels']:
            1234
========================================================================================= short test summary info =========================================================================================
FAILED tests/test_examples.py::test_error_examples[no-specific-match] - RuntimeError: The provided dependencies data is invalid.
====================================================================================== 1 failed, 13 passed in 0.53s =======================================================================================
vyasr commented 1 year ago

Just making some notes of my planned next steps here to help reviewers understand where I was going with this and identify any blind spots I may have:

vyasr commented 1 year ago

Thanks to a makeover from @csadorf I think this PR is ready for review. @ajschmidt8 let me know what you think of it.

One minor note, I snuck in an isort bugfix.

github-actions[bot] commented 1 year ago

:tada: This PR is included in version 1.1.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: