kenchris / urlpattern-polyfill

URLPattern polyfill
https://www.npmjs.com/package/urlpattern-polyfill
MIT License
268 stars 31 forks source link

Execution times with double star syntax and dashes #121

Open lenovouser opened 1 year ago

lenovouser commented 1 year ago

Platform: macOS Architecture: arm64 Node: v18.18.0

The double-star syntax is unnecessary, since it also works with a single star, but I still want to report this to hopefully get it fixed.

If you execute the code below:

const { URLPattern } = require('urlpattern-polyfill');

const pattern = new URLPattern({ hostname: '**.google.com' });

const check = value => {
    if (pattern.test(value)) {
        return;
    }

    return value;
};

const start1 = process.hrtime.bigint();
const result1 = check('https://example.com');
const end1 = process.hrtime.bigint();

console.log(result1);
console.log('took ' + (end1 - start1) / 1000n + 'ms');

const start2 = process.hrtime.bigint();
const result2 = check('https://subdomain.subdomain.example.com');
const end2 = process.hrtime.bigint();

console.log(result2);
console.log('took ' + (end2 - start2) / 1_000_000n + 's');

const start3 = process.hrtime.bigint();
const result3 = check('https://dash-subdomain.subdomain.example.com');
const end3 = process.hrtime.bigint();

console.log(result3);
console.log('took ' + (end3 - start3) / 1_000_000n + 's');
johnp commented 1 month ago

This is reproducible in Node.js and Deno (both use V8), but not in Bun (uses WebKit). The actual regex which takes exponential time in this case is /^((?:.*)*)\.google\.com$/u.exec('subdomain.subdomain.example.com'), and it is executed here.

Running that RegEx in Firefox, SpiderMonkey notably gives up with an "InternalError: too much recursion". Epiphany (based on WebKit) behaves just like Bun, so WebKit is quite clearly is not affected. Even much longer test strings do not lead to an observable rise in execution time with WebKit.

Using V8's experimental linear time modifier (new RegExp('^((?:.*)*).google.com$', 'lu');), V8 says:

SyntaxError: Invalid regular expression: /^((?:.)).google.com$/lu: Cannot be executed in linear time

Using this Node.js debugging script:

const v8 = require('node:v8');
v8.setFlagsFromString('--enable-experimental-regexp-engine');
v8.setFlagsFromString('--default-to-experimental-regexp-engine');
v8.setFlagsFromString('--enable-experimental-regexp-engine-on-excessive-backtracks');
v8.setFlagsFromString('--trace-experimental-regexp-engine');

console.time('Time')
// biome-ignore lint/complexity/useRegexLiterals: bypass Node.js Regex-flags validator
const re = new RegExp('^((?:.*)*).google.com$', 'u'); // 'lu'
re.exec('subdomain.domain.example.com');
console.timeEnd('Time')

I get the output (irrelevant lines omitted)

Pattern not supported by experimental engine: 0x268a7e8ddf01 <String[24]: #^(?:(?:.)).google.com$> Pattern not supported by experimental engine: 0x268a7e8ddf01 <String[24]: #^(?:(?:.)).google.com$> Time: 14.836s

, so even V8's experimental RegEx engine doesn't help, unless one also removes the unicode flag:

Initializing experimental regexp 0x214a3521df01 <String[24]: #^(?:(?:.)).google.com$> Compiling experimental regexp 0x214a3521df01 <String[24]: #^(?:(?:.)).google.com$> Executing experimental regexp 0x214a3521df01 <String[24]: #^(?:(?:.)).google.com$> Time: 0.147ms

In that case, execution time is on par with WebKit.