nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.34k stars 29.48k forks source link

RFC cluster: make scheduler pluggable #10880

Closed cjihrig closed 7 years ago

cjihrig commented 7 years ago

I believe a few people have asked for this directly and indirectly in the past. It would be nice to be able to customize the cluster master's scheduling policy. For example, I've seen people request the ability to "pause" a worker.

One possibility is to define a custom scheduling policy, and let the user pass in a constructor like our existing SharedHandle or RoundRobinHandle. Users would need to define a constructor function, add() method, and remove() method. I put together a small proof of concept, including example usage, in https://github.com/cjihrig/node-1/commit/427beec00cea49e7199eac065eb596224ad18677.

The downside is that it exposes some internal ugliness like the constructor function taking key, address, port, addressType, fd, and flags arguments. A simpler alternative might be to only require the user to define a scheduling function like RoundRobinHandle#distribute() and then use that with the RoundRobinHandler unless the SharedHandle is used.

Thoughts?

sam-github commented 7 years ago

Perhaps tangential to this, but I would like to see cluster removed from node core, and published and maintained as an npm module, so that people can choose how they want it to behave, possibly implement HTTP-level session affinity, better load balancing algorithms, and even have differing opinions about whether such things should or should not be in cluster, and even differ on questions such as what .listen(0) should do, which has been very contentious.

Moving cluster out is trivial for master, its all code that could be done as a module now.

For the worker, it requires the master to force child processes to run some master-specified js code before the user's code, and to replace the net and dgram factory functions so they can be delegated to the master.

cjihrig commented 7 years ago

cluster could be removed from core and published to npm, but it would still need to ship with core to avoid likely huge ecosystem breakage.

mscdex commented 7 years ago

IMHO cluster is worth keeping in core because I see it as one of those things that is very useful and is not necessarily something that is easy to get "right."

Ordinarily I'd say having someone maintain it as an npm module would be fine, but the experiences I've had with readable-stream and compatibility with core, it seems like we'd end up in similar situations where we can't make the kinds of improvements in the core cluster module without causing potentially major issues for users of the npm module...

sam-github commented 7 years ago

Isn't the problem with readable-stream is that we have streams in core surfacing through the net/fs/http/etc. APIs that readable-streams has to be compatible with? This isn't the situation with cluster.

  1. If there was a cluster on npm the one kept in core for backwards compat would be feature frozen, and close to zero maintenance. All requests for features, including semver major changes, would be instantly directed to the external module. Core could focus on fundamental mechanisms, not policy.
  2. As it happens, cluster happens to be one of the easier things to remove from node core. Its true that code uses it, but there are probably less than a dozen modules that do, it would be interesting to see the stats on that, I'm not sure how to generate them. Its not a module that shows up in every app, like url or fs, but one that is used by various production tooling (strong-supervisor, pm2, ccluster, etc.).

Cluster never worked well, so I would not say "we got it right". I would say we did quite well considering what options are available, but that's about it. IMO its sucking the energy from development of better solutions. When core has something, people assume that it is actively being made the best it can be. For example, to solve the HTTP session affinity problems that cluster creates (google around for websockets and cluster, its awful), it would have to do HTTP protocol level parsing in the master. At that point, its performance would be terrible, and node just isn't constructed to compete with nginx for this. So, we aren't doing this, but some external project might, they might even decide to bind nginx into an npm module, or something equally as audacious. But that isn't happening, because we offer what looks like an easy close to zero-config alternative, but as people work with it more, they realize it doesn't work so well.

bnoordhuis commented 7 years ago

Some of its implementation details leak into other built-in modules (net, dgram, the debugger.) If we rip out cluster, we need to come up with a semi-public API that lets cluster do its work, plus a stability policy (what versions of cluster-on-npm are expected to work with what versions of node.)

It's a discussion that should be moved to its own issue, though. :-)

mscdex commented 7 years ago

Isn't the problem with readable-stream is that we have streams in core surfacing through the net/fs/http/etc. APIs that readable-streams has to be compatible with?

My thoughts were more about making backwards-incompatible improvements. There are things we've wanted to do in various modules, but can't because of the readable-stream npm module.

One particular example of this is the direct usage of ._events in streams (it was making sure its 'error' event handler was always called first). Even when we changed that by adding a public API to prepend an event handler (instead of append) and modified readable-stream to use that API when available, that only helps for situations where users are pulling the latest versions of readable-stream, meanwhile there are many modules and users who are not pulling in the latest (either because they explicitly peg to a particular version of the module or they use an older semver range). So now we basically have our hands tied until we're confident that that number of people not pulling in those changes is small enough that we can just go ahead and make the changes in core.

Yemanu commented 7 years ago

@cjihrig I like the refactoring in cjihrig@427beec which makes code more readable.

Since the Scheduler is now expose to public API, it is easier work on the prototype as required for custom scheduler.

It feels odd to require internal as shown below, so it would be nice to have it in a different directory. const Scheduler = require('internal/cluster/round_robin_handle');

sam-github commented 7 years ago

@mscdex that is a very interesting situation, though I'm not sure we should coddle people who deliberately choose to nail their dep versions. If you choose to do that, you also choose to manually update them when updating node. If you have deliberately chosen the pain, its hard to argue when you get it. Though I guess the problem is people with loose dep specs depending on a useful npmjs module where its author made the mistake of pinning their deps get hurt. :-( Still, the general drift here, as for cluster, should be to a more rigid (and smaller) node API. The src of the issues here seems to be an overly complex stream API in node. Too bad it wasn't a simpler callback based one.

So, to be clear, I'm not expecting that if someone spends a bunch of time to improve cluster that those changes will be rejected, but I do hope we can get ourselves out of these API traps in the future by narrowing the scope and improving the quality of node's core APIs. Considering the jams we get into when we merge in large complex APIs, like streams or cluster, and then can no longer improve them because some significant set of community modules grow to depend on them, I suggest we start to be more conservative in this.

cjihrig commented 7 years ago

Any thoughts on the original subject of this issue? :-D

sam-github commented 7 years ago

Pluggability: +/- 0, I don't want to work on it, but if someone does it, it fits the "can't be done anywhere but core" test, so I don't see why it would rejected out of hand, if it meets quality.

Big issue would be to do it in a way that doesn't expose more internals pinning us to the current implementation.

Also, destabilizing the current code base would be a negative, so hopefully its not a huge change.

bnoordhuis commented 7 years ago

The downside is that it exposes some internal ugliness like the constructor function taking key, address, port, addressType, fd, and flags arguments.

Yes, that seems bad; I can't think of any redeeming qualities. Any change to the plumbing would automatically become semver-major.

A simpler alternative might be to only require the user to define a scheduling function like RoundRobinHandle#distribute() and then use that with the RoundRobinHandler unless the SharedHandle is used.

That seems much better (and I remember discussing that elsewhere although I don't remember where.)

redonkulus commented 7 years ago

@cjihrig @bnoordhuis @sam-github what are the next steps to move this forward? We are currently using an internal fork of Node without own hacks in the cluster module to get what we want. But this makes it extremely difficult to stay up to speed with the latest releases. Would love to see a landing spot for this use case (without the major semver breakage if possible).

sam-github commented 7 years ago

@redonkulus someone has to volunteer to do the work, maybe you?

jasnell commented 7 years ago

The downside is that it exposes some internal ugliness like the constructor function taking key, address, port, addressType, fd, and flags arguments.

Not necessarily. We could potentially create a base class for custom schedulers that userland would extend to do their own things. This base class could abstract many of those details... e.g. something along the lines of:

class MyCustomScheduler {
  init( ... ) { ... }
  add(worker, send) { ... }
  remove(worker) { ...}
}
cluster.customScheduler(MyCustomScheduler);

Then, within cluster...

customScheduler(proto) {
  return class CustomScheduler extends proto {
    constructor( ... ) {
      super(); // default constructor
      // do whatever we need
      init(/* pass stuff in */);
    }
  };
}

Hiding whatever API ugliness we currently have would be fairly straightforward.

cjihrig commented 7 years ago

someone has to volunteer to do the work, maybe you?

I'm volunteering. I already started on it a bit, but wanted feedback first.

redonkulus commented 7 years ago

@cjihrig do you have enough feedback to get started with a PR?

cjihrig commented 7 years ago

@redonkulus yea, I'm working on it now.

cjihrig commented 7 years ago

@redonkulus sorry, I was traveling most of this week. I have opened a WIP PR #11546

fedoral commented 7 years ago

nice job

jasnell commented 7 years ago

@cjihrig closed the PR #11546 that worked on this. Accordingly I'm going to close this issue. We can revisit if someone else wants to take over this work.