Closed russellb closed 4 months ago
I can't say I love how verbose the schema file is...
One thing I have in mind for this is automating the validation of custom pipeline configs in other repos. Those are not as easy to test when code in the repo is changes, so I'm hoping this can help catch some accidental compatibility breakages that merge into the tree that don't affect configs in tree, but might affect custom ones elsewhere.
When we load a taxonomy yaml file, we validate its contents against a schema contained in
instructlab.schema
. Here is an example schema:https://github.com/instructlab/schema/blob/main/src/instructlab/schema/v2/knowledge.json
There's some code for loading this schema and validating with it in
instructlab.sdg.utils.taxonomy
.It would be nice if we did something similar when we load a pipeline yaml. It can help catch subtle mistakes. We could also provide instructions for how to check a configuration against the schema manually before trying to run it.
Here is a start at what a schema could look like for pipeline configs (auto-generated, not tested yet).