SIGSEGV in futures handler under linux

josephg commented 5 years ago

This is an issue tracking apple/foundationdb#1294. It seems like there's a race condition or something affecting the uv_send_async thread handling code, which under rare conditions causes a crash.

aikoven commented 5 years ago

What does your load look like? Is it mostly reads? Mostly writes? What sort of TPS rate are you running? Are you using atomics? Watches?

We're implementing an event store with two basic operations:

append: write event blob as a value to a versionstamped key, then write that versionstamp to a separate "counter" key, all in a transaction.
subscribe: iterate through keys with event blobs and set up a watch on counter upon reaching end. When the watch fires, repeat.

In our test we have 10 readers per writer and we're currently at about 3k writes per second.

We're currently not using atomics.

Are you calling any transaction / database methods with callbacks, or are you just using promises like the documentation suggests?

Only promises.

What version of nodejs are you using? (node --version)

# node --version
v10.13.0

Is the foundationdb server on the same machine as the client?

It's on a different machine. We have dedicated machines for FDB cluster in a cloud.

aikoven commented 5 years ago

I'll try to make a minimal implementation to reproduce this error, because our layer already got pretty sophisticated.

josephg commented 5 years ago

Awesome; that’d be great. Sounds similar to what I’m doing in statecraft.

Given what you’ve said there’s a decent chance we can repro this with the binding tester. I’ll spin up an ubuntu machine with the stack and run it overnight to see if I can make it crash

aikoven commented 5 years ago

Here's a minimal version of our event store: https://gist.github.com/aikoven/b6788f3ce0d146302d1a9d9201576bb7

I've run it locally (in Docker) for several hours and it didn't crash. Then we deployed 12 instances of this script to the cloud and got segfault within 5 minutes.

In both cases, the client part is run in Docker. As for FDB, we have a cluster of 20 FDB processes in the cloud. For local development, I use a single-process database from official FDB Docker image.

aikoven commented 5 years ago

We purposely write events to the same stream to simulate contention. So the crash may have something to do with transaction retries.

josephg commented 5 years ago

Small update: I've been 5 instances of this script for the last 6 hours on a little freebsd box I have lying around with node v10.15.2 and fdb 6.1.0. I was hoping I could repro this problem, but no dice - its stable. I'll fire up an ubuntu box next. Either this issue is linux only, the bug is specific to fdb 6.0.x, or it needs more latency to show up. Or maybe it specifically needs node 10.13.

josephg commented 5 years ago

Small update: Managed to reproduce this on ubuntu 17.10 / node 10.13.0 / fdb 6.0.18. Still no repro on freebsd / fdb 6.1. There's probably a hint there, but I can certainly work on it now that I can reproduce the problem. Thanks for your help making a test case @aikoven. It would have been much harder to debug this issue without your help.

josephg commented 5 years ago

I've been playing with this for a couple days and I'm still not 100% sure whats causing it. At this point I suspect its a bug in libuv; but I'm not entirely sure.

The crash is on this line in libuv. Under normal behaviour, the following things happen:

FDBFuture is created in the main thread. uv_async created in main thread.
FDBFuture resolves in fdb networking thread. uv_async_send called to signal to the main thread that the corresponding future is ready.
The main thread resolves the FDB future and closes the uv_async object.
The network thread returns from uv_async_send and control passes back into libfdb_c.

Instead what is happening is that sometimes:

FDBFuture is created in the main thread. uv_async created in main thread.
FDBFuture resolves in fdb networking thread. uv_async_send called to signal to the main thread that the corresponding future is ready.
BEFORE uv_async_send returns the main thread resolves the FDB future and closes the uv_async object.
uv_async_send reads some memory from the freed uv_async object. This is a use after free error, and it triggers a SEGFAULT.

This makes sense given that there aren't any read barriers in uv_async_send, so the compiler is free to reorder reads around the write syscall (which is what actually yields to the main thread, and ends up freeing the uv_async object).

What doesn't make sense is that this is the assembly listing for uv_async_send (node 11.12.0 / libuv 1.26):

ASM listening for uv_async_send

```asm 0000000000aab790 : aab790: mov 0x78(%rdi),%eax aab793: test %eax,%eax aab795: jne aab7a5 aab797: mov $0x1,%edx aab79c: lock cmpxchg %edx,0x78(%rdi) aab7a1: test %eax,%eax aab7a3: je aab7b0 aab7a5: xor %eax,%eax aab7a7: retq aab7a8: nopl 0x0(%rax,%rax,1) aab7b0: push %rbp aab7b1: mov %rsp,%rbp aab7b4: push %r14 aab7b6: push %r13 aab7b8: push %r12 aab7ba: push %rbx aab7bb: mov 0x8(%rdi),%rax aab7bf: mov 0x200(%rax),%ebx # <---- CRASH HERE aab7c5: mov 0x1f8(%rax),%edx aab7cb: cmp $0xffffffff,%ebx aab7ce: je aab820 aab7d0: mov $0x1,%r13d aab7d6: mov $0x1,%r14d aab7dc: mov $0x1d727ef,%r12d aab7e2: jmp aab7f4 aab7e4: nopl 0x0(%rax) aab7e8: callq 8d9f70 <__errno_location@plt> aab7ed: mov (%rax),%eax aab7ef: cmp $0x4,%eax aab7f2: jne aab840 aab7f4: mov %r13,%rdx aab7f7: mov %r12,%rsi aab7fa: mov %ebx,%edi aab7fc: callq 8d9fb0 # <---- WRITE SYSCALL HERE aab801: cmp $0xffffffff,%eax aab804: movslq %eax,%rdx aab807: je aab7e8 aab809: cmp %rdx,%r14 aab80c: jne aab845 aab80e: pop %rbx aab80f: xor %eax,%eax aab811: pop %r12 aab813: pop %r13 aab815: pop %r14 aab817: pop %rbp aab818: retq aab819: nopl 0x0(%rax) aab820: mov %edx,%ebx aab822: mov $0x8,%r13d aab828: mov $0x8,%r14d aab82e: mov $0x1c00500,%r12d aab834: jmp aab7f4 aab836: nopw %cs:0x0(%rax,%rax,1) aab840: cmp $0xb,%eax aab843: je aab80e aab845: callq 8d97b0 aab84a: nopw 0x0(%rax,%rax,1) ```

The crash happens at instruction aab7bf (line 131 in async.c: fd = loop->async_wfd;). But the write hasn't happened yet. So what I can't figure out is how the other thread is getting notified and why its been woken up.

A few notes on this:

Adding a simple spinlock around the uv_async_send call and the deallocation code should fix this. The spinlock would barely ever do anything. Specifically, lock around this call and around this block
I really want to know & understand why this is failing though. I hate not understanding.
Spending more time with the code its obvious to me that allocating a new uv_async object for every fdb future is really inefficient. We'd be better off reusing a single uv_async object for all fdb futures; and add a tiny SPSC queue which pushed ready FDB futures then popped them off from the main thread.
And thats much closer to what the equivalent code would do anyway if we port the code to use the new NAPI API instead of NaN. I want to do this eventually anyway, but it will break compatibility with node 8.

aikoven commented 5 years ago

I really appreciate your working on this. Unfortunately I’m not experienced enough in these low-level matters, so I can’t give any valuable comment here. But please tell me if you need any assistance from my side 🙂

josephg commented 5 years ago

Sure, thanks 😅

josephg commented 5 years ago

Update: the bug is actually a long standing, previously unknown bug in libuv. @bnoordhuis has put together a PR to fix the issue with a spin lock (just like I was suggesting above). In the meantime I’ve put together a simple spin lock around the node fdb code which seems to fix the issue (I ran it for a few hours and it worked fine). I’ll push it and publish tomorrow. That should resolve the crashes you’re seeing in prod.

Meanwhile it looks like I’m using async_send in a really inefficient way. Rewriting all this code on top of napi will probably result in a significant performance boost because it will cut down on async handle allocations and syscalls.

josephg commented 5 years ago

I've just published 📦0.9.2 with a workaround to this issue via spin locks while we wait for the new libuv code to trickle through to nodejs.

@aikoven can you grab 0.9.2 and confirm that it fixes your bug?

aikoven commented 5 years ago

This is amazing! Thank you.

We stopped our test cluster for now because we've finished with our load test. So, unfortunately, I can't check the fix right now. Let's consider this fixed for now, and I will check back once we have a new cluster.

Thanks again!

josephg commented 5 years ago

Great. Let me know if you still run into problems!

aikoven commented 4 years ago

Hey, we're continuing our FDB project, and I can confirm that we don't get these errors anymore.

Thanks!

josephg / node-foundationdb

SIGSEGV in futures handler under linux #28