kappa-db / multifeed

Multi-writer hypercore.
135 stars 27 forks source link

test: add a memory leak test #51

Open okdistribute opened 3 years ago

okdistribute commented 3 years ago

This is a failing test for the memory leak. System will print Killed after less than 30 seconds. Check /var/log/syslog for memory leak message. Related to https://github.com/cabal-club/commons/issues/15

If the issue is having too many hypercores open at once, an approach like a manifest and request middleware might also fix the memory leak issue for most production use cases: https://github.com/kappa-db/multifeed/pull/26

Or, some way to replicate in batches, and close the feeds that aren't actively replicating.

To mimic real life scenario for cabal, about 70% of the feeds are empty in this test -- only 30% of them have 1 log entry.

hackergrrl commented 3 years ago

This is great @okdistribute.

I poked at this a bit and wrote a patch! I noticed something big first: my machine can't even allocate 500 multifeeds in memory. Allocating and syncing everything at once also overwhelmed my computer, so I changed run-parallel for async-series.

I also made the numbers a bit smaller, and used process.memoryUsage() to try and see how much heap the syncs were sucking up, instead of relying on the process to get killed. (My computer doesn't seem to nicely kill it, instead it sucks up my swap and freezes everything.)

From there, I noticed that the memory use goes up non-linearly with the number of syncs. For example, 10 syncs uses 34mb of heap, 20 uses 114mb, and 40 syncs uses 447mb.

I didn't want to push on top of your work, and have been wanting to mess with non-github workflows, so here's a patch you can apply locally to see if you like it:

curl -sL http://eight45.net/pub/0001-multifeed.patch | git am
okdistribute commented 3 years ago

Thanks @noffle! Feel free to push the patch here, I agree about smaller increments, I picked a big number out of laziness 😅

lejeunerenard commented 3 years ago

@okdistribute this is a great example of how easy it is to bloat a multifeed with feeds! It's clear when you add more peers the memory usage balloons.

I did some poking around and think this test might be what you want to test as it simulates multiple peers in one program (and hence one heap). So the final heapUsed value includes all the simulated peers, aka the entire network. Here's my analysis to explain:

@noffle was right that the memory goes up non-linearly with the number of multifeed peers. Looking at the heap snapshot I found that ~89% of the memory used was retained by Feeds AKA hypercores. So I did a series of tests and got the following results:

Multifeeds (n) Feeds (f) post sync memory
2 6 9.438872 mb
4 17 15.238376 mb
10 74 46.004 mb
20 249 146.085864 mb
30 524 290.418824 mb
40 899 492.932712 mb

See [1] for how f is correlated to n

Suggested Changes to Test

I believe that not all multifeeds should be included in the memory results and instead each multifeed should be deleted once it's replicated to one persistent multifeed to simulate transient peers connecting. Here is an example patch that deletes all but the fully synced multifeed:

    console.log('pre-sync heap', process.memoryUsage().heapUsed / 1000000, 'mb')
    series(replications, (err) => {
      if (err) throw err
      console.log('replicated! everything!')

      console.log('pre-closing heap', process.memoryUsage().heapUsed / 1000000, 'mb')

      let keep = feeds[0]
      let closures = []
      for (let i = feeds.length - 1; i > 0; i--) {
        closures.push((done) => {
          let a = feeds[i]
          delete feeds[i]
          a.close(done)
        })
      }

      series(closures, (err) => {
        if (err) throw err
        global.gc()

        console.log('keep.feeds().length', keep.feeds().length)
        console.log('post-closing heap', process.memoryUsage().heapUsed / 1000000, 'mb')
        debugger
        t.end()
      })
    })

This requires the --expose-gc flag when running node, but without that the heapUsed includes multifeeds without references. Even so the garbage collection doesn't get every floating reference as I have approximately 24 Feeds still in the heap at a the breakpoint above. Eventually after multiple GCs it goes down to the expected 21 Feeds. The resulting 'post-closing' heapUsed for 20 multifeeds total is 20.720344 mb compared to the 'pre-closing' of 147.788704 mb. I get similar result with other ns:

Multifeeds (n) Feeds (f) post closing memory
10 14 14.090496 mb
20 24 20.720344 mb
30 34 27.0382 mb
40 43 33.731832 mb

Concluding Remarks

Sorry for the tome, but I hope this helps! Right now having a middleware layer would be awesome as weeding out feeds that a peer doesn't want to host would be awesome. Alternatively, people can save a bit on memory if the storage is file based (though I do believe memory still increases over time with random-access-file as the storage.

Footnote

[1] The number of feeds (f) goes up much quicker than the number of multifeeds (n), which makes sense given they are replicating feeds to one another. But I expected f to be f = n² + n since every multifeed has 1 feed as the root and then n feeds, one for each multifeed it gets via replication including itself. Instead f = 2 * n + ( 1 + 2 + ... + n - 1) + n - 1 with the 2 * n standing for the root and local feed for each node and the (1 + 2 + ... + n - 1 ) + n -1 standing for the feeds replicated other than itself. This sum comes from the fact that the replication is in series done with only the previous and next node if they are available. So the first node only gets the second nodes feeds and each node adds one feed to the running total. The last node is the one exception as it has no following node to sync with.

The multifeeds might have fully replicated prior to @noffle's reworking to run them in series as the replication might have persisted long enough for the feeds to propagate out. One potential solution is to sync the last node to be synced with the other nodes it wasn't synced with to ensure they all have the same feeds. Though this may not matter if we narrow the test to one multifeed.

cblgh commented 3 years ago

mafintosh says:

@cblgh you might wanna rerun that leak test tho, we have no known leaks in [hyper-]core since 1-2 months ago

cblgh commented 3 years ago

& a comment from @rangermauve

I was trying to find a leak in multifeed actually, but I couldn't find anything when I did heap dumps. Like, I had it running for a while and connecting and disconnecting periodically, but it didn't look like anything new was being kept other than system garbage like compiled code. :person_shrugging: