hyperjump-io / json-schema-bundle

A tool for bundling JSON Schema documents
MIT License
14 stars 1 forks source link

Invalidates regular expressions when parsing the file #2

Closed jonaslagoni closed 2 years ago

jonaslagoni commented 2 years ago

While the file does not contain any $ref it fails to get the schema regardless, so it is the minimal reproducible example:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "patternProperties": {
    "^x-[\\w\\d\\.\\-\\_]+$": true
  }
}

Code that is being run:

const Bundler = require("@hyperjump/json-schema-bundler");

(async () => {
  try {
    // Get the initial schema to pass to the bundler
    const main = await Bundler.get(`file://${__dirname}/../asyncapi.json`);

    // The bundler will fetch from the file system, network, or internal schemas as
    // needed to build to bundle.
    const bundle = await Bundler.bundle(main);

    console.log(bundle);
  } catch (error) {
    console.log(error);
  }
})();

The error being thrown:

SyntaxError: Invalid regular expression: /^x-[\w\d\.\-\_]+$/: Invalid escape
    at new RegExp (<anonymous>)
    at /x/bundler/node_modules/@hyperjump/json-schema/lib/keywords/patternProperties.js:7:50
    at Array.map (<anonymous>)
    at /xbundler/node_modules/@hyperjump/pact/lib/map.js:4:55
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
jdesrosiers commented 2 years ago

This is a bug in the schema, not in this library. As the error is telling you, the regular expression is invalid. Apparently node.js doesn't like it when you try to escape a character that shouldn't be escaped. In this case, that character appears to be _. Browsers appear to be more tolerant of this error.

jonaslagoni commented 2 years ago

Ha! I just assumed the regular expression was accurate, sorry! And actually, it is all the escaped characters that are troublesome 🧐 Weird none of the online validation tools does not complain about these...

On a side note, why does a bundler need to parse regular expressions? 🤔

jdesrosiers commented 2 years ago

Weird none of the online validation tools does not complain about these...

This won't have anything to do with the implementation because implementations don't implement their own regex engine. They use the one built into the platform they are running on. That's why all implementations will throw an error with this schema when running on node.js but not when running in a browser. It's because of the difference between the RegExp implementation used in browsers vs the one used in node.js. It has nothing to do with the JSON Schema implementation itself.

why does a bundler need to parse regular expressions?

It doesn't, but that functionality came by default. Hyperjump JSC is built specifically to facilitate building tools like this. It compiles the schema into an AST that I can then use to build whatever tooling I want (validator, bundler, code gen, etc). By using that library to generate an AST, I was able to skip the hardest part of this implementation. Which is why I was able to put this together in an afternoon. But, that's also why there are some things in the AST building process that have nothing to do with bundlers. Those things are there for the benefit of other tools I've built or might build using the same AST (including a validator).

jonaslagoni commented 2 years ago

Hyperjump JSC is built specifically to facilitate building tools like this. It compiles the schema into an AST that I can then use to build whatever tooling I want (validator, bundler, code gen, etc).

Makes sense! Thanks for the clarification 🙂

Regarding the regex, I still can't wrap my head around why this ain't working, and what we can do to fix it, or where the problem lies 😆 This is the regex we try to build: https://regexr.com/69hcv, however when described in a JSON file we need to escape the \ chars.

If I run a small node example:

// First test
const firstReg = new RegExp('^x-[\\w\\d\\.\\-\\_]+$', 'g');
const firstRes = firstReg.test('x-something');
console.log(firstRes);

// Second test
const secondReg = new RegExp('^x-[\w\d\.\-\_]+$', 'g');
const secondRes = secondReg.test('x-something');
console.log(secondRes);

It produces:

true 
false

The second regex is what the bundler tries to pass to RegExp, so if we change the regex in the JSON Schema to ^x-[\\\\w\\\\d\\\\.\\\\-\\\\_]+$ (to try and get the second option) it no longer matches accurately in browsers, however, works in your bundler, at least it is accepted in your implementation 🧐

So I am left puzzled 😆

jdesrosiers commented 2 years ago

I just figured out that browsers aren't working differently than node. The difference is the use of the u flag (for unicode support) in the regular expression (which is a SHOULD requirement in JSON Schema). When you use the u flag you get the error. When you don't it's more tolerant for some reason.

Your failing test case should be, new RegExp("^x-[\\w\\d\\.\\-\\_]+$", "u"). This regex has two unnecessary escapings: . and _, but _ appears to be the one that it's complaining about. This works, new RegExp("^x-[\\w\\d\\.\\-_]+$", "u"), but this is better new RegExp("^x-[\\w\\d.\\-_]+$", "u"). You could also avoid escaping the - by putting it at the end so it isn't ambiguous whether it's the character - or a range specifier, new RegExp("^x-[\\w\\d._-]+$", "u").

jonaslagoni commented 2 years ago

Thanks, @jdesrosiers! That should clear it up 🙇