spotify / luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Apache License 2.0
17.71k stars 2.39k forks source link

Can pass a JSON schema to DictParameter and ListParameter #3217

Closed adrien-berchet closed 1 year ago

adrien-berchet commented 1 year ago

Description

Add an optional parameter to DictParameter and ListParameter so the loaded value can be validated against a JSON schema.

Motivation and Context

Adding a simple validation step reduces the amount of code in the run method of the tasks. Also, the arguments are checked at the beginning of the workflow so it fails faster, which is always better than failing during the workflow.

Have you tested this? If so, how?

I added simple tests.

adrien-berchet commented 1 year ago

Note that this PR adds the jsonschema dependency. If you prefer I can make it optional.

adrien-berchet commented 1 year ago

Cool, thanks! Ok, I pushed a new commit to make it optional. Now a warning is raised is the user tries to use the new schema parameter while jsonschema is not installed.

adrien-berchet commented 1 year ago

With pleasure :)