Closed bajtos closed 11 years ago
Few more thoughts:
I can imagine a Node.js HTTP server can use a special cookie to track the affinity. I.e. when an HTTP response is returned to a client, a cookie is set with an ID of the worker that handled the request. When the load balancer sees this cookie in an incoming request, it will select the worker using the ID in the cookie and skip RR algorithm.
It's not an unreasonable suggestion but one potential problem is that the master has to parse and buffer the HTTP headers (in order for the worker to reuse them.)
Parsing is reasonably CPU-intensive though in node.js it's often dwarfed by other things so it might be a wash.
Buffering is more problematic; an evil client could send massive amounts of HTTP headers and trigger a denial of service that way. It's not an intractable problem but there are some caveats to consider.
It also means tighter coupling between the cluster and http modules. That's not the worst thing in the world but it might make future changes more difficult.
It surprises me that this isn't a more commonly seen problem with socket.io, do most load balancers implement affinity?
Because with socket.io's built-in redis store, propogation would also trigger this bug, causing random auth failures, particularly under load.
I agree that it seems socket.io is losing momentum, in the face of other projects. It might have already dropped off the back of the wave.
I can think of some hack-arounds... such as in the case of auth failure, retry for some time... this would only slow down the client when they are not authorized, which doesn't seem so bad. Its a fix that should go in socket.io, but we could put it in the cluster socket.io store for now, because we have control.
What do you think? Do a few second loop to see if auth will eventually be readable from store, or received via mq? (not sure if its transmitted key/value, or as message).
Its hacky, but random failures under deployment might be worse.
LGTM
Fixed a bug in ClusterStore implementation and added debug logs to help troubleshoot the problems.
@sam-github Please review.
The build is still failing due to a race condition caused by the socket.io's internal design. See lib/manager.js L805 to L808.
Details:
ClusterStore.prototype.publish()
.I don't see how this problem can be fixed with the current internal design, as the pub/sub concept does not provide means for the publisher to wait until all subscribers received the notification.
Thinking about the problem from high-level perspective, perfect RR load-balancer might not be the most effective solution for servers under heavy load, as it adds extra overhead of synchronizing (session) state between worker processes. IMO there should be an option in Node.js core to configure client or session affinity, so that a set of related requests is always handled by the same worker process.
This change should bring two performance-related improvements:
IMO the impact will be even more significant once there will be the option of running cluster workers on multiple machines.
I can imagine a Node.js HTTP server can use a special cookie to track the affinity. I.e. when an HTTP response is returned to a client, a cookie is set with an ID of the worker that handled the request. When the load balancer sees this cookie in an incoming request, it will select the worker using the ID in the cookie and skip RR algorithm.
What are your thoughts on this, @bnoordhuis @piscisaureus @stdarg? If this idea is not dismissed straight away, we should move the discussion somewhere else (I guess a node or nu-js issue would be the best place).
EDIT SockJS has some useful info on load balancing too.