uhop / node-re2

node.js bindings for RE2: fast, safe alternative to backtracking regular expression engines.
Other
495 stars 53 forks source link

Multiple re2 installations cause a SIGABRT failure #73

Closed seshness closed 4 years ago

seshness commented 4 years ago

I have a somewhat unusual situation with multiple installations of node-re2 in a single node process.

Here's a minimized directory structure that replicates the situation:

test-two-re2s
├── README.md
├── main.js
├── node_modules
│   ├── module-a
│   │   ├── index.js
│   │   ├── node_modules
│   │   │   └── re2
│   │   └── package.json
│   └── module-b
│       ├── index.js
│       ├── node_modules
│       │   └── re2
│       └── package.json
└── package.json

and a tarball of this setup: test-two-re2s.tar.gz

Running a script that requires both instances of re2 causes a SIGABRT:

± node main.js
node(92720,0x108e79dc0) malloc: *** error for object 0x102a484a0: pointer being freed was not allocated
node(92720,0x108e79dc0) malloc: *** set a breakpoint in malloc_error_break to debug
[1]    92720 abort      node main.js

lldb shows this occurs in node::Environment::RunCleanup(), but I don't have a debug build of node handy to examine this further:

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x00007fff6785b33a libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fff6785b33a <+10>: jae    0x7fff6785b344            ; <+20>
    0x7fff6785b33c <+12>: movq   %rax, %rdi
    0x7fff6785b33f <+15>: jmp    0x7fff67855629            ; cerror_nocancel
    0x7fff6785b344 <+20>: retq
Target 0: (node) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff6785b33a libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff67917e60 libsystem_pthread.dylib`pthread_kill + 430
    frame #2: 0x00007fff677e2808 libsystem_c.dylib`abort + 120
    frame #3: 0x00007fff678d850b libsystem_malloc.dylib`malloc_vreport + 548
    frame #4: 0x00007fff678db40f libsystem_malloc.dylib`malloc_report + 151
    frame #5: 0x000000010003a8cc node`node::Environment::RunCleanup() + 164
    frame #6: 0x00000001000b67ee node`node::NodeMainInstance::Run() + 572
    frame #7: 0x000000010005dc1f node`node::Start(int, char**) + 294
    frame #8: 0x00007fff67713cc9 libdyld.dylib`start + 1
    frame #9: 0x00007fff67713cc9 libdyld.dylib`start + 1
(lldb)

I only ran into this error when upgrading from v1.10.x of node-re2 to v1.15.0.

I can try to rearrange my application's dependency tree to only have one re2 instance in the node process, but I wonder if there's a broader problem with how the module is registered?

Thanks for publishing this module!

uhop commented 4 years ago

Node has changed how modules are registered. This change was reflected in re2 starting with 1.11.0.

TBH, I don't think that Node ever supported loading the same binary modules. If it worked, it likely was only by accident.

Just out of curiosity: why do you want to load two (identical?) copies of re2 in the same process?

seshness commented 4 years ago

That's a great question 😅 I have a monorepo with multiple, custom, otherwise independent node modules. Each of these node modules has their own dependencies in package.json, and their own independent npm ci-generated node_modules folder. Two of them use re2.

One of our services uses both of those node modules, and requires both copies of re2.

I've refactored to use only one installation of re2 in the process, so if this isn't something the node-re2 project will explicitly support then please feel free to close the issue!

uhop commented 4 years ago

re2 is a glue code that wraps RE2 (a C++ library) and presents it as a RegExp drop-in replacement. In most cases it is repackaging JavaScript structures to required C++ structures and back. It has nothing related to managing a life-cycle or the way it is registered/loaded. It is a pretty standard Node binary extension project written in C++ with NAN and compiled by node-gyp. There are absolutely no smarts in the registration code.

Up to 1.10.0 re2 was used in two scenarios: single threaded and in child processes. Nobody reported any problems. Yet at some point I started to get a feedback that it doesn't work in a multi-threaded case (see #57 and #65). It turned out that Node has changed the registration process, so I had to update my code. I suspect that the Node change has affected how multiple instances of the same module are handled.

The changes I've made were mostly about two things:

The latter makes code arguably more robust in the reuse/multicore scenarios.

Obviously it is possible that there is a latent bug in my code related to allocating memory directly or indirectly using facilities other than malloc(), which is expected from a C++ code. I will audit my code looking for that.

Meanwhile, I have a question for you: do you use Buffer in your code with re2? It may help me narrowing down the search of the problem.

uhop commented 4 years ago

I went over my code looking for dynamically allocated memory and how it is freed. Nothing suspicious was found. I chalk the problem off as not supported scenario by Node.

katlim-br commented 2 years ago

Hello being 2021, we found the same error, we imported npm "email-forward-parser" and "url-regex-safe" both use node-re2 v1.16.0.

So when we run jest --runInBand, we see the error

node(49844,0x1138a2600) malloc: *** error for object 0x7f824d511ff0: pointer being freed was not allocated
node(49844,0x1138a2600) malloc: *** set a breakpoint in malloc_error_break to debug

UPDATE: lucky for us url-regex-safe library just added v3 which exactly solves this issue by adding RE2 as a peer dependency, and not a direct one. For us, this means that the RE2 dependency coming from email-forward-parser would kick in, and both libraries will share the one same instance.

@uhop is there a possibility for a deeper review about having unfixed global state? We are still seeing the issue, and after some discussions with email-forward-parser author, indeed the need to put it as peerDependency is not the "ideal solution". Removing global state (as mentioned here before) is the right way to go to avoid any memory sharing problems.

titanism commented 1 year ago

Hi - we're the authors of url-regex-safe, happy that the peer dep helped here. We had a similar issue with upgrading RE2 from v17.7.7 to v17.7.8, see #143. Just something to be aware of (at least on macOS platforms).

uhop commented 11 months ago

I decided to check out ava, so I created a minimal project, and ... it WFM.

The package.json:

{
  "name": "test-ava",
  "version": "1.0.0",
  "description": "",
  "type": "module",
  "main": "index.js",
  "scripts": {
    "test": "ava"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "devDependencies": {
    "ava": "^5.3.1"
  },
  "dependencies": {
    "re2": "^1.20.5"
  }
}

I had three identical test files:

import RE2 from 're2';
import test from 'ava';

test('#1', t => {
    for (let i = 0; i < 1000000; ++i) {
        t.is(new RE2('\\d').test('1'), true);
        t.is(new RE2('\\d').test('a'), false);
    }
});

The result:

$ time npm test

> test-ava@1.0.0 test
> ava

  ✔ test2 › #1 (5.4s)
  ✔ test1 › #1 (5.4s)
  ✔ test3 › #1 (5.4s)
  ─

  3 tests passed

real    0m6.005s
user    0m15.146s
sys 0m1.768s

I can see that all three files were executed in parallel (the real time is 6s). What am I doing wrong? Can you repro this project?