tc39 / proposal-regexp-atomic-operators

https://rbuckton.github.io/proposal-regexp-atomic-operators
BSD 3-Clause "New" or "Revised" License
16 stars 1 forks source link

Suboptimal possible workaround #3

Closed barsdeveloper closed 1 year ago

barsdeveloper commented 1 year ago

For those interested to have this behavior, there is a trick explained here: https://blog.stevenlevithan.com/archives/mimic-atomic-groups I works in practice but it makes your regex slower, see the following example

const regex = /^(?=((a|[ab])*))\1$/;

function test(length) {
  const str = 'a'.repeat(length) + 'c';
  const now = performance.now();
  regex.test(str);
  return performance.now() - now;
}

for (let i = 0; i < 50; i++) {
  console.log({ length: i, time: test(i) });
}

Time is zero for every run including 49. The problem with this is that it makes regular matching slower, take a look at the following example: trying to match a large string containing json data (I took it from here: https://github.com/Chevrotain/chevrotain/blob/gh-pages/performance/samples/1K_json_no_escaping.js and removed any js code creating just a huge multi-line, single string about 5500 lines of text).

In order to match it I will use the following regex "[^"\\]*(?:\\.[^"\\]*)*" that matches correctly in about 30ms/40ms. https://regex101.com/r/5OcEL2/1 However if you remove the last " from the string (so that it can't match anymore), it goes timeout: https://regex101.com/r/iACj1f/1

When trying to apply the trick described previously, we obtain the following regex "(?=([^"\\]*))\1(?=((?:\\.(?=([^"\\]*))\3)*))\2" that still matches correctly but it does so in more than 70ms (more or less it takes double the amount of time): https://regex101.com/r/OdIzJg/1 And unfortunately it still goes out of time when removing the last "

It can still be useful in some cases, for example if I reduce the number of lines in this string to 2500 (and remove also the last " so that it's not a valid string, the normal version will timeout https://regex101.com/r/MIcwxw/1 The second version will fail to match in a bit more than 800ms https://regex101.com/r/fWhWQO/1 I had to reduce it to about 1500 lines to fail to match in a comparable amount of time (a bit over 800ms) https://regex101.com/r/2T38Kh/1

ljharb commented 1 year ago

@barsdeveloper why are you posting this in multiple repos and then immediately closing it?

barsdeveloper commented 1 year ago

Because it's mentioned already here: https://github.com/tc39/proposal-regexp-atomic-operators/issues/2

ljharb commented 1 year ago

Duplicate of #2, then.