holepunchto / hyperdrive

Hyperdrive is a secure, real time distributed file system
Apache License 2.0
1.86k stars 134 forks source link

MaxListenersExceededWarning when creating multiple diff streams #241

Closed poga closed 1 year ago

poga commented 5 years ago

Version

hyperdrive: 9.15.0 random-access-memory: 3.1.1

Code to reproduce

const hyperdrive = require('hyperdrive')
const ram = require('random-access-memory')

let archive = hyperdrive(ram)

for (let i = 0; i < 20; i++) {
  archive.writeFile('foo', 'bar')
}

setTimeout(() => {
  console.log(archive.version) // version = 20

  for (let i = 0; i < 20; i++) {
    let diff = archive.createDiffStream(i)
  }
  /*
  (node:40126) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 append listeners added. Use emitter.setMaxListeners() to increase limit
  (node:40126) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 extension listeners added. Use emitter.setMaxListeners() to increase limit
  (node:40126) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 error listeners added. Use emitter.setMaxListeners() to increase limit
  (node:40126) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 error listeners added. Use emitter.setMaxListeners() to increase limit
  */
}, 1000)

Detail

I believe the core issue is here: https://github.com/mafintosh/hyperdrive/blob/master/index.js#L329.

When we create a diff stream, hyperdrive will create(checkout) a new archive version sharing the same metadata feed. The version archive will register its own event listeners on the same metadata feed. Therefore the leak.

I'm not sure how to fix it. Is there a way to close only the version archive without closing the origin archive too?

poga commented 5 years ago

Just tried the same code on hyperdrive v10. It still leaks listeners.

(node:94165) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 ready listeners added. Use emitter.setMaxListeners() to increase limit

We probably need some way to 'detach' an archive from hypercore feeds. Maybe we can freeze an archive and it will remove all its listeners from the feeds?

I'm going to play around it for a bit and see if it's reasonable.

RangerMauve commented 5 years ago

Is it really a leak if it's attaching new listeners? Maybe we need some sort of soft "close" method which will detach listeners on the old checked out archive once it's not needed. Or on the checked out archive once we're done listening to it.

poga commented 5 years ago

IIUC currently there's no way to detach these new listeners after creating a diff stream. Hence I was thinking that it's a leak.

I agree that calling it a leak might be too harsh. A "soft close" can definitely help in this situation!

RangerMauve commented 5 years ago

Hmm, if it's keeping the listeners there after you've finished with the diff stream it's definitely a memory leak. 😁

I'd imagine something like diffstream.end() should be cleaning up any listeners.

Just to confirm, are you seeing this with hyperdrive v9 or v10? (v10 is in the master branch and uses a different data structure for the filesystem representation)

poga commented 5 years ago

I've tried on v9 and v10 and can reproduce the warning on both versions.

poga commented 5 years ago

There are total 6 six event listeners leaked whenever we create a new checkout. Their corresponding events are:

For the first 5, they're created in the _ready callback in the index.js. Every time we create a new checkout, a new set of listeners is attached onto the same metadata feed used by the origin archive.

Currently there's no way to remove these listeners. Hence I added a new method closeCheckout which works like a "soft close" mentioned in the previous comment.

The last one happens in mountable-hypertrie. I've also submitted a PR for it.

andrewosh commented 5 years ago

Hey @poga,

As per the comment in the closeCheckout PR, I think it'd make more sense to just not register those listeners on checkouts at all and opt for another approach. append, for example, won't make sense on a checkout in the first place, and we might want to say that the others are only available on the base drive.

Another option is for the base drive to track all checkouts and dispatch events to them, instead of registering new listeners. The first way (no events on checkouts) is much simpler, but it's not immediately clear if that would be a confusing limitation. @mafintosh have thoughts on this?

poga commented 5 years ago

Yeah, not registering these listeners on checkouts definitely make more sense.

One limitation I can think of it is that error events will only triggered on the base archive. If an error happened on the base archive, people need to keep track of the relation between archives and checkouts, and propagate error by themself, which might be confusing.