Closed aslakhellesoy closed 11 years ago
Architecturally, this is by design in how browserchannel works.
Browserchannel intentionally makes sessions in memory only to remove a lot of edge cases whereby a failed request / successful request is sent out of order to multiple backend servers. There's a whole bunch of race conditions that are almost impossible to reproduce which can be caused by old connections and slow proxies which could surface bugs in browserchannel. Instead we depend on a load balancer providing mostly-sticky sessions and rock-solid reconnection logic to allow server hopping without losing state. Then transient state can exist only in memory, allowing a server to be able to authoritatively know when a client has timed out (and hence disconnected).
Note that this is how both raw TCP sockets and websockets work.
I agree that its inconvenient - I'd consider a pull request if all of the obscure edge cases which can crop up on the server had been covered, but I'm not quite sure what it would take to convince me of this. At a minimum it would require a fuzzer, and even then some of the sharejs logic would fail (sharejs needs to be able to send a message to one of its clients when a document has been updated - how does it do that if the client could be connected anywhere?).
There's also a middle road involving leaving browserchannel sessions bound to a server but using the oldSID
field in a reconnection to do a much faster server rollover, but it wouldn't really fix the problem.
For now, I suggest either setting your load balancer up to do mostly-sticky sessions (both ELB and varnish can do this. No idea about nginx), or moving to a different socket library.
... And the more I think about this, the more I'm convinced its impractical. The problem is that a server needs to be able to spontaneously send a message to a particular client. If the client's backchannel hit a random server each time, as well as a session state store, we would need a way to route messages between servers to the right client.
You could do this with redis pubsub, but you would also have to do message buffering on top to make sure messages actually make it. Its doable, but conservatively you would probably double the LOC in browserchannel. If you're keen, do it in a fork.
Thanks for the detailed response @josephg. Now that I understand the implications more I'm not so keen on implementing this - sounds too complicated. I'd rather move to a hosting provider where we can have (semi) sticky sessions. Or perhaps try WebSockets.
Thanks! /cc @jbpros @mattwynne
Running multiple browserchannel server processes behind a random/round-robin load balancer does not work. Heroku's request distribution with 2 or more dynos is an example of this.
Using 2 or more processes is how ShareJS scales, so it seems that browserchannel should support this as well.
The reason it doesn't work is that sessions (SID) are only stored in the server's memory. When a client with an established session sends a POST request, the load balancer may route the request to a different process than where the session was established. In this case the new process doesn't know about the supplied SID and responds with
400 Unknown SID
.This causes the client to generate a new SID and tries to establish a new session. This goes on forever.
I think that if there was a way to supply a session store to browserchannel (for example backed by redis), this problem could be fixed. Obviously, session lookup and storage would have to be asynchronous.
Would you accept a patch for this?