strongloop / strong-cluster-socket.io-store

Implementation of socket.io store using node's native cluster messaging
Other
23 stars 3 forks source link

Partial fix for failing build #4

Closed bajtos closed 11 years ago

bajtos commented 11 years ago

Fixed a bug in ClusterStore implementation and added debug logs to help troubleshoot the problems.

@sam-github Please review.

The build is still failing due to a race condition caused by the socket.io's internal design. See lib/manager.js L805 to L808.

Details:

I don't see how this problem can be fixed with the current internal design, as the pub/sub concept does not provide means for the publisher to wait until all subscribers received the notification.


Thinking about the problem from high-level perspective, perfect RR load-balancer might not be the most effective solution for servers under heavy load, as it adds extra overhead of synchronizing (session) state between worker processes. IMO there should be an option in Node.js core to configure client or session affinity, so that a set of related requests is always handled by the same worker process.

This change should bring two performance-related improvements:

IMO the impact will be even more significant once there will be the option of running cluster workers on multiple machines.

I can imagine a Node.js HTTP server can use a special cookie to track the affinity. I.e. when an HTTP response is returned to a client, a cookie is set with an ID of the worker that handled the request. When the load balancer sees this cookie in an incoming request, it will select the worker using the ID in the cookie and skip RR algorithm.

What are your thoughts on this, @bnoordhuis @piscisaureus @stdarg? If this idea is not dismissed straight away, we should move the discussion somewhere else (I guess a node or nu-js issue would be the best place).

EDIT SockJS has some useful info on load balancing too.

Sticky sessions

If you plan deploying more than one SockJS server, you must make sure that all HTTP requests for a single session will hit the same server. SockJS has two mechanisms that can be useful to achieve that:

  • Urls are prefixed with server and session id numbers, like: /resource/<server_number>/<session_id>/transport. This is useful for load balancers that support prefix-based affinity (HAProxy does).
  • JSESSIONID cookie is being set by SockJS-node. Many load balancers turn on sticky sessions if that cookie is set. This technique is derived from Java applications, where sticky sessions are often necessary. HAProxy does support this method, as well as some hosting providers, for example CloudFoundry. In order to enable this method on the client side, please supply a cookie:true option to SockJS constructor.
bajtos commented 11 years ago

Few more thoughts:

bnoordhuis commented 11 years ago

I can imagine a Node.js HTTP server can use a special cookie to track the affinity. I.e. when an HTTP response is returned to a client, a cookie is set with an ID of the worker that handled the request. When the load balancer sees this cookie in an incoming request, it will select the worker using the ID in the cookie and skip RR algorithm.

It's not an unreasonable suggestion but one potential problem is that the master has to parse and buffer the HTTP headers (in order for the worker to reuse them.)

Parsing is reasonably CPU-intensive though in node.js it's often dwarfed by other things so it might be a wash.

Buffering is more problematic; an evil client could send massive amounts of HTTP headers and trigger a denial of service that way. It's not an intractable problem but there are some caveats to consider.

It also means tighter coupling between the cluster and http modules. That's not the worst thing in the world but it might make future changes more difficult.

sam-github commented 11 years ago

It surprises me that this isn't a more commonly seen problem with socket.io, do most load balancers implement affinity?

Because with socket.io's built-in redis store, propogation would also trigger this bug, causing random auth failures, particularly under load.

I agree that it seems socket.io is losing momentum, in the face of other projects. It might have already dropped off the back of the wave.

I can think of some hack-arounds... such as in the case of auth failure, retry for some time... this would only slow down the client when they are not authorized, which doesn't seem so bad. Its a fix that should go in socket.io, but we could put it in the cluster socket.io store for now, because we have control.

What do you think? Do a few second loop to see if auth will eventually be readable from store, or received via mq? (not sure if its transmitted key/value, or as message).

Its hacky, but random failures under deployment might be worse.

sam-github commented 11 years ago

LGTM