python-jsonschema / check-jsonschema

A CLI and set of pre-commit hooks for jsonschema validation with built-in support for GitHub Workflows, Renovate, Azure Pipelines, and more!
https://check-jsonschema.readthedocs.io/en/stable
Other
212 stars 40 forks source link

!reference tag of .gitlab-ci.yml not supported? #112

Closed teake closed 2 years ago

teake commented 2 years ago

I can't get custom YaML tags to work in check-jsonschema. In particular, the GitLab CI schema defines a custom !reference tag that raises an uncaught exception when attempting to validate it with check-jsonschema. Here's an example .gitlab-ci.yml (taken from the GitLab docs):

include:
  - local: setup.yml

.teardown:
  after_script:
    - echo deleting environment

test:
  script:
    - !reference [.setup, script]
    - echo running my own command
  after_script:
    - !reference [.teardown, after_script]

Validating it throws an exception:

$ check-jsonschema --builtin-schema vendor.gitlab-ci .gitlab-ci.yml
Traceback (most recent call last):
  File "/usr/local/bin/check-jsonschema", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/check_jsonschema/cli.py", line 269, in main
    execute(args)
  File "/usr/local/lib/python3.10/site-packages/check_jsonschema/cli.py", line 316, in execute
    ret = checker.run()
  File "/usr/local/lib/python3.10/site-packages/check_jsonschema/checker.py", line 88, in run
    self._run()
  File "/usr/local/lib/python3.10/site-packages/check_jsonschema/checker.py", line 74, in _run
    errors = self._build_error_map()
  File "/usr/local/lib/python3.10/site-packages/check_jsonschema/checker.py", line 64, in _build_error_map
    for filename, doc in self._instance_loader.iter_files():
  File "/usr/local/lib/python3.10/site-packages/check_jsonschema/loaders/instance/__init__.py", line 61, in iter_files
    data = loadfunc(fp)
  File "/usr/local/lib/python3.10/site-packages/check_jsonschema/loaders/instance/yaml.py", line 26, in load
    data = _yaml.load(stream)
  File "/usr/local/lib/python3.10/site-packages/ruamel/yaml/main.py", line 434, in load
    return constructor.get_single_data()
  File "/usr/local/lib/python3.10/site-packages/ruamel/yaml/constructor.py", line 121, in get_single_data
    return self.construct_document(node)
  File "/usr/local/lib/python3.10/site-packages/ruamel/yaml/constructor.py", line 131, in construct_document
    for _dummy in generator:
  File "/usr/local/lib/python3.10/site-packages/ruamel/yaml/constructor.py", line 668, in construct_yaml_seq
    data.extend(self.construct_sequence(node))
  File "/usr/local/lib/python3.10/site-packages/ruamel/yaml/constructor.py", line 225, in construct_sequence
    return [self.construct_object(child, deep=deep) for child in node.value]
  File "/usr/local/lib/python3.10/site-packages/ruamel/yaml/constructor.py", line 225, in <listcomp>
    return [self.construct_object(child, deep=deep) for child in node.value]
  File "/usr/local/lib/python3.10/site-packages/ruamel/yaml/constructor.py", line 154, in construct_object
    data = self.construct_non_recursive_object(node)
  File "/usr/local/lib/python3.10/site-packages/ruamel/yaml/constructor.py", line 189, in construct_non_recursive_object
    data = constructor(self, node)
  File "/usr/local/lib/python3.10/site-packages/ruamel/yaml/constructor.py", line 690, in construct_undefined
    raise ConstructorError(
ruamel.yaml.constructor.ConstructorError: could not determine a constructor for the tag '!reference'
  in ".gitlab-ci.yml", line 10, column 7

This is against check-jsonschema 0.16.1 and jsonschema 4.6.0.

The GitLab documentation says to add a custom tag to the validator, but I'm not sure how to do that with check-jsonschema / jsonschema. Since the web editor of GitLab does support it since GitLab 15.1 (issue / MR) I was hoping check-jsonschema would be able to support is as well.

sirosen commented 2 years ago

Thanks for the detailed issue report! I'm marking this as an enhancement request.

Custom tags are a behavior of the YAML parser (ruamel.yaml for check-jsonschema), so there is a framework for this to be supported. But it will require some work. The steps we need are

I'm interested in pursuing this, but it will take some time to work out. In particular, I need to see how the !reference resolution gets hooked into YAML parsing and make sure we give it the right behavior. Once that's done, the various other bits should fall into place pretty easily.

The !reference parsing behavior will also, by the look of it, rely upon logic to read and evaluate gitlab-ci's include config.

In terms of additional info, the thing I'm missing the most is the "how" regarding ruamel.yaml. That will take some reading, but if anyone has helpful tips I'm all ears!

teake commented 2 years ago

Great, thanks for picking this up!

To be clear though, I am only interested in getting check-jsonschema to not trip up over !reference tags. I don't care if they're recursively parsed. In fact, in general you cannot parse them because they can refer to snippets that come from include statements pointing to other GitLab projects -- projects that you cannot access locally when you're outside of GitLab.

So if check-jsonschema can just treat the value of !reference tags as a sequence of strings (as specified in GitLab's JSON schema) I'd be happy.

sirosen commented 2 years ago

This information is extremely useful, thank you! I was thinking we needed to process local include directives and wasn't aware of the fact that the includes could point at other repositories.

I think the basic outline / approach is still the same, though it will be quite a bit easier. I need to figure out how to instruct the YAML parser that a !reference should become an array of strings.

sirosen commented 2 years ago

This should be working now in v0.17.0. Please let me know if you see any issues with it!

teake commented 2 years ago

If I enable the --data-transform gitlab-ci switch it works. Thanks!