python-jsonschema / check-jsonschema

A CLI and set of pre-commit hooks for jsonschema validation with built-in support for GitHub Workflows, Renovate, Azure Pipelines, and more!
https://check-jsonschema.readthedocs.io/en/stable
Other
207 stars 40 forks source link

Support for custom YAML tags #489

Open DevOpsJeremy opened 3 days ago

DevOpsJeremy commented 3 days ago

This issue requested support specifically for the !reference tag in a Gitlab CI file, however, there doesn't appear to be support for other tags. Specifically in my case, Ansible-specific tags like !unsafe or !vault.

Example content:

controller_templates:
  - name: My job template
    extra_vars:
      pass: !unsafe '{{ my_value }}'

Output:

Traceback (most recent call last):
  File "/home/user/.local/bin/check-jsonschema", line 8, in <module>
    sys.exit(main())
  File "/home/user/.local/lib/python3.6/site-packages/check_jsonschema/__init__.py", line 26, in main
    ret = checker.run()
  File "/home/user/.local/lib/python3.6/site-packages/check_jsonschema/checker.py", line 83, in run
    self._run()
  File "/home/user/.local/lib/python3.6/site-packages/check_jsonschema/checker.py", line 66, in _run
    errors = self._build_error_map()
  File "/home/user/.local/lib/python3.6/site-packages/check_jsonschema/checker.py", line 56, in _build_error_map
    for filename, doc in self._instance_loader.iter_files():
  File "/home/user/.local/lib/python3.6/site-packages/check_jsonschema/loaders/instance/__init__.py", line 63, in iter_files
    data = loadfunc(fp)
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/main.py", line 343, in load
    return constructor.get_single_data()
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 113, in get_single_data
    return self.construct_document(node)
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 123, in construct_document
    for _dummy in generator:
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 723, in construct_yaml_map
    value = self.construct_mapping(node)
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 440, in construct_mapping
    return BaseConstructor.construct_mapping(self, node, deep=deep)
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 255, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 146, in construct_object
    data = self.construct_non_recursive_object(node)
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 181, in construct_non_recursive_object
    data = constructor(self, node)
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 743, in construct_undefined
    node.start_mark,
ruamel.yaml.constructor.ConstructorError: could not determine a constructor for the tag '!unsafe'
  in "job_templates.yml", line 337, column 18

The workaround in my case is to convert the content to JSON with yq then validate--which isn't ideal.

Is it possible to add support for custom tags like these? Possibly by allowing the user to pass in a custom constructor for their respective tags that can be passed through to the ruamel.yaml parser.

sirosen commented 22 hours ago

Although a custom constructor is a more general solution -- and I like the generality in principle -- if yq is working for you as a preprocessor, then that means that you don't need to evaluate these tags in order for the content to validate. Stripping them out or ignoring them is enough. If there's an option to ignore unknown tags in ruamel.yaml, then that's something which would probably fit well with the existing CLI structure for baked-in transforms for Azure and GitLab.

If you really need customizations for the YAML parsing, I think the interface needs some serious thought. One solution I like would be to allow users to pass their own parser. Then you can do whatever you want with no limitations, but check-jsonschema's interface stays really simple to understand.

I only want to pursue more complex solutions if there's sufficient demand. I think we can meet your needs with "ignore unknown tags" as an option, so I'll look into that first.