josephg / node-foundationdb

Modern Node.js FoundationDB bindings
Other
116 stars 17 forks source link

Possible memory leak #48

Open aikoven opened 3 years ago

aikoven commented 3 years ago

We noticed a memory leak in our services that use node-foundationdb, which appears to be somewhere in native code, because NodeJS heap size stays at consistent levels, but the total process memory grows.

I made a repro, see https://github.com/aikoven/node-foundationdb-memory-leak-repro

This repro makes simple read-only transactions in a loop:

while (true) {
  await db.doTn(async tn => {
    await tn.get('');
  });
}

After about an hour, the process consumed about 700MB of memory, while NodeJS heap usage was consistently ~90MB.

josephg commented 3 years ago

Hm - thanks for the report. Thats a worry. It probably won't matter but - what versions of nodejs / fdb are you using, and on what OS?

aikoven commented 3 years ago

Ah, right, sorry. I'm using node:12 docker image, which is based on Debian (I guess).

# uname -a
Linux 48d478a21ee4 5.4.39-linuxkit #1 SMP Fri May 8 23:03:06 UTC 2020 x86_64 GNU/Linux

FDB server and client are of version 6.2.28.

josephg commented 3 years ago

Yep, confirmed it leaks. It leaks way faster without the tn.get() call:

const fdb = require('foundationdb')
fdb.setAPIVersion(620)

const db = fdb.open()

;(async () => {
  while (true) {
    await db.doTn(async tn => {
    });
  }
})()

I've tracked down the issue and seem to have fixed it. I wasn't freeing a temporary heap allocated structure after transactions were passed to libfdb_c.

josephg commented 3 years ago

📦 foundationdb@1.0.3. See if that fixes things!

aikoven commented 3 years ago

With 1.0.3, I still observe growing memory consumption.

Without the tn.get('') call, it reaches 1.5Gb within 1 minute. With tn.get it grows from 200Mb to 450Mb in 30 minutes and continues to grow.

josephg commented 3 years ago

Can you try checking out from source to see if you can reproduce the behaviour? That sounds like the behaviour I was seeing in 1.0.2. Maybe something went wrong in the publishing process? (I rewrote the node module publishing scripts yesterday.)

I'm still seeing a (much slower) memory leak in 1.0.3. Without the tn.get() call, in about a minute process memory usage goes from 18mb to 24mb. Valgrind doesn't show the leak, regardless of how many iterations I run the process for. I'm wondering if maybe there's some internal FDB structures we aren't cleaning up properly, and the foundationdb C library is deallocating that memory automatically when its thread is terminated (hence those allocations not showing up in valgrind).

This was the valgrind stats from running 10M iterations (with no tn.get call).

==36832== HEAP SUMMARY:
==36832==     in use at exit: 201,613 bytes in 834 blocks
==36832==   total heap usage: 14,881,904 allocs, 14,881,070 frees, 836,096,099 bytes allocated
...
==36832== LEAK SUMMARY:
==36832==    definitely lost: 64 bytes in 1 blocks
==36832==    indirectly lost: 0 bytes in 0 blocks
==36832==      possibly lost: 304 bytes in 1 blocks
==36832==    still reachable: 201,245 bytes in 832 blocks
==36832==                       of which reachable via heuristic:
==36832==                         stdstring          : 18,155 bytes in 313 blocks
==36832==                         multipleinheritance: 48 bytes in 1 blocks
==36832==         suppressed: 0 bytes in 0 blocks
==36832== Reachable blocks (those to which a pointer was found) are not shown.
==36832== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==36832== 
==36832== For lists of detected and suppressed errors, rerun with: -s
==36832== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
aikoven commented 3 years ago

Here's what I did:

rm -rf node_modules/foundationdb/prebuilds

Checked that the code would crash, then rebuilt binaries using

cd node_modules/foundationdb
yarn node-gyp-build

I didn't see any change after that, though I did some more experimenting:

  1. I ran my test for longer time and figured that I was wrong: with 1.0.3 the process memory settles around 3.3Gb and doesn't grow indefinitely.

  2. I also noticed that NodeJS heap grows pretty fast at first, so I tried to collect allocation timeline profile. For some reason, this resulted in both NodeJS heap and process memory stay at consistent levels, while I collect the profile. Here are some examples:

    Starting profiling right away:

    0s    NodeJS heap: 89.08MB    Process memory: 185MB
    10s   NodeJS heap: 79.06MB    Process memory: 215.2MB
    20s   NodeJS heap: 79.06MB    Process memory: 215.2MB
    30s   NodeJS heap: 79.06MB    Process memory: 215.2MB
    40s   NodeJS heap: 79.06MB    Process memory: 215.2MB
    50s   NodeJS heap: 79.06MB    Process memory: 215.2MB
    60s   NodeJS heap: 79.06MB    Process memory: 215.2MB

    Starting profiling at 30s, running for 30s:

    0s    NodeJS heap: 87.97MB    Process memory: 175.39MB
    10s   NodeJS heap: 154.61MB   Process memory: 542.76MB
    20s   NodeJS heap: 143.91MB   Process memory: 725.77MB
    30s   NodeJS heap: 145.48MB   Process memory: 750.65MB
    # profiling started
    40s   NodeJS heap: 79.04MB    Process memory: 711.21MB
    50s   NodeJS heap: 79.04MB    Process memory: 711.21MB
    60s   NodeJS heap: 79.04MB    Process memory: 711.21MB
    # profiling stopped
    70s   NodeJS heap: 79.04MB    Process memory: 788.95MB
    80s   NodeJS heap: 89.4MB     Process memory: 828.09MB
    90s   NodeJS heap: 92.1MB     Process memory: 826.71MB

    This looks pretty weird, it probably has something to do with GC, but forcing GC using debugger doesn't free all the consumed memory. Also, I don't see any leaked objects in NodeJS heap.

  3. I ran the test against NodeJS 14, and it seems to consume memory much slower:

    0s    NodeJS heap: 87.6MB     Process memory: 183.71MB
    10s   NodeJS heap: 116.67MB   Process memory: 304.7MB
    20s   NodeJS heap: 82.5MB     Process memory: 348.34MB
    30s   NodeJS heap: 91.73MB    Process memory: 366.36MB
    40s   NodeJS heap: 91.87MB    Process memory: 369.73MB
    50s   NodeJS heap: 90.61MB    Process memory: 371.93MB
    60s   NodeJS heap: 103.81MB   Process memory: 379.21MB

My hypothesis is that under heavy load NodeJS triggers GC less often; and retained NodeJS objects lead to attached native objects being retained as well, making some internal FDB client buffer grow.

Anyway, let me test 1.0.3 with our production services. Will check back once I have data.

Thank you for your time!

josephg commented 3 years ago

Huh, yeah - maybe something like that. I don't think memory usage should grow as it is though. You're definitely using 1.0.3? (Check node_modules/foundationdb/package.json)

I bet if I rewrote the same test in raw C it would sit at a constant amount of memory. ... I might try tracking the number of allocated FDB Futures and see what happens to that number over time

aikoven commented 3 years ago

You're definitely using 1.0.3? (Check node_modules/foundationdb/package.json)

Yep, did just that.

aikoven commented 3 years ago

After running 1.0.3 on one of our production services for a couple days, here's what I've got:

image

It can be seen that the major leak went away. The memory still grows, but really slow.

NodeJS heap:

image

Another observation: during load bursts, the consumed memory sometimes grows, and not being released:

image

May have something to do with FDB client's Fast Allocator (see https://forums.foundationdb.org/t/memory-leak-in-c-api/554/2), so this may be expected.

josephg commented 3 years ago

I’m glad that helped. The memory leak I fixed for 1.0.3 looked pretty significant, and I was surprised the other day when it sounded like it didn’t make much difference.

And hm, it could be to do with the allocator. But I’d expect those simple loops to sit on a single fdb transaction and reuse it every time. That’s definitely not the full story.

Snowman2908 commented 4 months ago

Hi! We noticed a memory leak when using watch Repo for debug: https://github.com/Snowman2908/node-foundationdb-watch-memory-leak

Starting profiling:

app-1           | $ /node_modules/.bin/ts-node index.ts
app-1           | NodeJS heap: 94.99MB
app-1           | Process memory: 290.81MB
app-1           | NodeJS heap: 99.58MB
app-1           | Process memory: 305.81MB
app-1           | NodeJS heap: 102.77MB
app-1           | Process memory: 321.31MB
app-1           | NodeJS heap: 105.6MB

End profiling after 10 min:

app-1           | Process memory: 949.81MB
app-1           | NodeJS heap: 95.78MB
app-1           | Process memory: 963.19MB
app-1           | NodeJS heap: 97.89MB
app-1           | Process memory: 976.06MB
app-1           | NodeJS heap: 97.07MB
app-1           | Process memory: 988.94MB
app-1           | NodeJS heap: 96.43MB
app-1           | Process memory: 1002.56MB
app-1           | NodeJS heap: 100.89MB
app-1           | Process memory: 1016.69MB

The situation is similar to what happened with tn, NodeJS heap size stays at consistent levels, but the total process memory grows

I use macOS with rosetta, but we have the same situation with a memory leak in our services in Kubernetes