python-jsonschema / check-jsonschema

A CLI and set of pre-commit hooks for jsonschema validation with built-in support for GitHub Workflows, Renovate, Azure Pipelines, and more!
https://check-jsonschema.readthedocs.io/en/stable
Other
191 stars 39 forks source link

[BUG] RegEx not recognised #436

Closed patvdleer closed 1 month ago

patvdleer commented 1 month ago

Using the gitlab-ci validation my RegEx on coverage isn't recognised as such while it does work.

Schema validation errors were encountered.
  .gitlab-ci.yml::$.unittests:python-3.12.coverage: '/(?i)total.*? (100(?:\\.0+)?\\%|[1-9]?\\d(?:\\.\\d+)?\\%)$/' is not a 'regex'
unittests:python-3.12:
  <<: *unittests
  image: python:3.12
  coverage: '/(?i)total.*? (100(?:\.0+)?\%|[1-9]?\d(?:\.\d+)?\%)$/'
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage.xml
sirosen commented 1 month ago

I've seen this problem before with another user. The issue is that this is a Ruby regex, using some Ruby-specific syntax, so it's not easy to parse in other languages.

In the prior case, the user was able to change their expression to only use more universal syntax. I don't know if it's possible for you to drop the case-insensitive marker (?I), but that's what immediately strikes me as problematic.

check-jsonschema would benefit from a new mode for handling Ruby regexes, but I don't know how we could implement it. Handling JS was doable via regress.

One thing you can do, while I think about this, is to disable regex format checking with --disable-formats regex.

p-rogalski commented 1 month ago

Related isssue: https://github.com/python-jsonschema/check-jsonschema/issues/397

The fix was to use the regex TOTAL.*? (100(?:\.0+)?\%|[1-9]?\d(?:\.\d+)?\%)$ instead. @sirosen asked GitLab to update their docs in https://gitlab.com/gitlab-org/gitlab/-/merge_requests/144268, which they did.

However, GitLab decided to update their docs again and reintroduced the old regex, which is probably a copy/paste mistake.

patvdleer commented 1 month ago

While changing it to TOTAL. should fix it, I gather that (?i) is the new way of writing it, I'm not sure how standard this is. If so I'm guessing the Python regex engine should be updated to fix it.

I'll mark this ticket as fixed/closed but it might be handy to add a note to the docs of check-jsonschema.

sirosen commented 1 month ago

However, GitLab decided to update their docs again and reintroduced the old regex, which is probably a copy/paste mistake.

Good catch! I see that it was a community contribution to reformat that page, and probably accidental. I'll open a MR to revert that part of the content. Thanks for chiming in with this.


As for supporting Ruby syntax... (?i) itself is a common syntax, but the trouble is that under non-Ruby engines those leading and trailing slashes don't make a ton of sense. The case insensitive marker is expected to be at the start of the expression. Which it is if you're speaking Ruby but isn't of you're speaking Python or JavaScript.

Stripping off the slashes is a possible start, but it's not sufficient for many features. Maybe I should consider it anyway, but I'd need a good way to document it.

sirosen commented 1 month ago

For cross-linkage and record keeping: I've opened up another fix PR for gitlab docs (https://gitlab.com/gitlab-org/gitlab/-/merge_requests/154991).

(I was perhaps a bit gruff over there. But I carefully researched the change that I put in on that doc, and it was blown away by something where they clearly didn't even look carefully at the render. So I was justifiably a bit grumpy.)