laverdet / node-fibers

Fiber/coroutine support for v8 and node.
MIT License
3.56k stars 224 forks source link

Adjust external allocated memory less often #429

Closed benjamn closed 4 years ago

benjamn commented 4 years ago

Calling v8::Isolate::AdjustAmountOfExternalAllocatedMemory frequently appears to result in excessive garbage collections and wasted CPU cycles, per discussion here and as stated in the Node documentation: "Registering externally allocated memory will trigger global garbage collections more often than it would otherwise."

While I think the V8 implementation could/should do a better job of handling requests to adjust external memory without over-reacting to each and every request, in the meantime the fibers library can work around the issue by not calling uni::AdjustAmountOfExternalAllocatedMemory as often. You can see in the Meteor 1.9 PR discussion that this change seems to have fixed the CPU spikes that multiple developers encountered in production, and passes all of Meteor's tests, which exercise fibers pretty thoroughly.

This fix was inspired by https://github.com/marudor/libxmljs2/pull/22, which addressed https://github.com/nodejs/node/issues/30995. I would note that the current implementation in this PR under-reports external memory usage, since the reporting lags behind the actual usage by up to ~10MB. I'm definitely open to switching things around a bit so that uni::AdjustAmountOfExternalAllocatedMemory gets called with a larger number of bytes than necessary, thereby over-reporting the external memory usage, if you prefer that approach.

Here's another project that benefitted from adjusting external allocated memory less often: https://github.com/mapnik/node-mapnik/issues/136

It would be ideal to ship Meteor 1.9 with an official version of fibers, rather than a GitHub URL, but I realize that's asking for your time at a busy/vacation time of year.

laverdet commented 4 years ago

Doesn't v8 already have some threshold logic?

https://github.com/v8/v8/blob/e08436ce07c11eb1a43b2eb5065da1f6a1f89c6b/include/v8.h#L11799-L11838

It seems like v8 won't actually send a notification to the GC unless more than 32mb has been allocated. Each fiber is a 1mb so that would be a lot of fibers.

Also did you experiment with just removing AdjustAmountOfExternalAllocatedMemory entirely? That might be a good idea because GC probably isn't finding a lot of orphaned fibers anyway.

benjamn commented 4 years ago

It looks like V8 has imposed a threshold since v6.1.58, which was first present in Node v9.0.0, which updated V8 to v6.1.534.36. The threshold was only 1MB until v6.1.94, when it increased to 32MB (its current value)… but that change should also have been included in the Node 9 changes (and thus Node 12.14.0 too). Hmm.

Without doing a lot more digging, I don't have a good answer as to why the V8 thresholding was inadequate, though I would note that this area of the code is fairly complicated, with some significant known bugs like getting the subtraction backwards. Perhaps there are still unknown bugs that cause the memory pressure to be checked too often? It's either that, or multiple Meteor developers are misreading their CPU charts in the same way, which seems unlikely.

In other words, I would be more than happy for fibers to stop using AdjustAmountOfExternalAllocatedMemory entirely, given that it seems to be causing more problems than it's solving—which, as you pointed out, may be a positive number of problems versus zero real benefits.

Is there a particular experiment you would like to see to validate this removal, other than running a patched version of fibers through the Meteor test suite?

benjamn commented 4 years ago

@laverdet Ok, from the Meteor perspective, simply not calling AdjustAmountOfExternalAllocatedMemory seems to work.

Would you be willing to release this (much simpler) change as fibers@4.0.3? If you're not 100% confident yet, we could avoid tagging 4.0.3 as latest for the time being, so Meteor could install it explicitly, but it would not be the default version of fibers installed from npm. Even fibers@4.0.3-meteor would allow us to release Meteor 1.9 without depending on a GitHub URL.

Besides your argument about V8 not having much opportunity to collect Fibers, I'm fairly confident this approach is reasonable, because Fiber objects are pooled/recycled, which effectively provides a form of garbage collection, and the way V8 attempts to do thresholding suggests absolute precision in the reporting of external memory is not very important, so it seems safe to skip the reporting entirely.

laverdet commented 4 years ago

Sounds good to me. Based on my experience the v8 garbage collector is perfectly capable of shooting itself in the foot without us helping.

laverdet commented 4 years ago

4.0.3 is on npm and untagged as suggested. I didn't test any of the binaries and rushed through the deployment but I had my fingers crossed so everything should be fine.

benjamn commented 4 years ago

I did notice in the Galaxy logs that fibers@4.0.3 is falling back to compilation from source because there's a problem with the binary:

> fibers@4.0.3 install /app/bundle/programs/server/node_modules/fibers
> node build.js || nodejs build.js

`linux-x64-72-glibc` exists; testing
Problem with the binary; manual build incoming
make: Entering directory `/app/bundle/programs/server/node_modules/fibers/build'
 CXX(target) Release/obj.target/fibers/src/fibers.o
 CXX(target) Release/obj.target/fibers/src/coroutine.o
 CC(target) Release/obj.target/fibers/src/libcoro/coro.o
 SOLINK_MODULE(target) Release/obj.target/fibers.node
 COPY Release/fibers.node
make: Leaving directory `/app/bundle/programs/server/node_modules/fibers/build'
Installed in `/app/bundle/programs/server/node_modules/fibers/bin/linux-x64-72-glibc/fibers.node`

However, this was also happening for fibers@4.0.2, and I can install fibers@4.0.3 without compiling it on a different x86_64 linux machine, so I think this is just a harmless problem with the specific Docker image we use for Galaxy (maybe a different GLIBC version, not sure). Anyway, since the compilation succeeds, it should always succeed, so this seems a Galaxy problem that we can ignore for now.

@filipenevola If this ever becomes a problem, I think we just need to update our base Ubuntu image here: https://github.com/meteor/galaxy-images/blob/master/galaxy-app/Dockerfile#L1 (let me know if you need my help with that).

laverdet commented 4 years ago

No telling. The linux binaries come from a Gentoo server I have laying around that hasn't seen an update in 5 years or so.