uhop / node-re2

node.js bindings for RE2: fast, safe alternative to backtracking regular expression engines.
Other
495 stars 53 forks source link

Odd issue with re2 after resetting yarn.lock file #94

Closed niftylettuce closed 3 years ago

niftylettuce commented 3 years ago

This seems like re2 related, but I thought we should major bump on breaking changes in re2 here. I guess not though, and we have to be extra safe.

Here's a code example which shows the breaking change:

str.replace(URL_REGEX, (match, ...args) => { // <-- breaking change here, need to do `(match, offset, string)` instead
  const offset = args[12]; // <--- breaking change here
  const string = args[13]; // <--- breaking change here
  const nextChar = string.slice( // <--- throws error cannot read "slice" of undefined
    offset + match.length,
    offset + match.length + 1
  );
  // NOTE: that matches such as foo.iS will still match as foo.is
  // so we may want to have case-sensitivity option for url-regex
  // in the future, or do that case-sensitivity matching on the tld here
  // (Gmail for example only matches FOO.COM or foo.com)
  // (but note it breaks foo.itýbeep.com into foo.it and ýbeep.com)
  // (note Gmail matches "test.it123.com.foobar_123.com" incorrectly)
  if (new RE2(/^\w$/).test(nextChar)) return '';
  return match;
});

You can reproduce this by:

git clone https://github.com/spamscanner/spamscanner.git
cd spamscanner
npm install
npm test

You'll see the same error here.

If you use "resolutions" field to lock it, then the error goes away.

Previous builds were running fine on v1.15.4 in the past, but if I rebuild it will error. See https://travis-ci.com/spamscanner/spamscanner.

niftylettuce commented 3 years ago

I'm trying to narrow down which version this bug happens in, one moment...

niftylettuce commented 3 years ago

I can't seem to narrow this down to re2. If you wanted to lend a hand @uhop, you could just fork my repo, try to bump deps, and see which one breaks. I'm trying to figure this out now myself 🤦.

niftylettuce commented 3 years ago

Okay so as soon as I remove yarn.lock and re-run yarn followed by a yarn test, the once passing tests are now breaking. So it's some child dependency or some weird dependency nesting issue causing this. Not 100% sure, could be re2 still but I'm not positive because I haven't been able to narrow out a single package yet otherwise @uhop.

uhop commented 3 years ago

When I have time I will write more tests for replace() and will try to find the problem from my side. TBH, I didn't expect any API change. Everything should be exactly as it is defined in String.prototype.replace().

I suggest going back to 1.15.4 for now and bump it back when the problem is fixed on my side (if it is my problem, which currently looks very likely).

niftylettuce commented 3 years ago

Here's a yarn lock diff, I tried 1.15.4 and it doesn't work still.

https://gist.github.com/niftylettuce/be299a638efd055f57a6489ef7d434b9

niftylettuce commented 3 years ago

I've even tried using npm directly via https://github.com/rogeriochaves/npm-force-resolutions and it still does not work. There is something quite odd going on here.

niftylettuce commented 3 years ago

This does not seem to be a problem with re2, at least I think... it seems like it is though - but I've locked version everywhere to 1.15.4, but the issue still persists.

niftylettuce commented 3 years ago

I also upgraded from Node v12 LTS to Node v14 LTS recently, investigating this now.

niftylettuce commented 3 years ago

Not a Node v12 vs v14 issue, just tested.

niftylettuce commented 3 years ago

My guess is it's something to do with Ava/Babel at this point but I'm still investigating.

uhop commented 3 years ago

My guess is it's something to do with Ava/Babel at this point but I'm still investigating.

Is it possible that Babel compiles/transforms code that it shouldn't? It could be tested by creating a simple file, which does just one operation that fails and see if it works or not. Then, depending on results, inspect the offending code to find a difference, if there is one.

I had a problem in the past (not related to re2) when Babel failed to compile some code. I "fixed" it by upgrading Babel. My point is that Babel can have bugs too.

niftylettuce commented 3 years ago

Ah-ha. I figured it out, it seems that url-regex-safe is the culprit. Trying to investigate as to why now. I went one by one.

niftylettuce commented 3 years ago

Found it. https://github.com/sindresorhus/ip-regex/commit/4147df0401a28552de138b0897bcacf7ec3a298d

niftylettuce commented 3 years ago

Thanks for your help and being here @uhop ❤️