rethinkdb / horizon

Horizon is a realtime, open-source backend for JavaScript apps.
MIT License
6.78k stars 349 forks source link

Client connection terminated after few minutes #799

Closed arthurvi closed 8 years ago

arthurvi commented 8 years ago

The problem

After upgrading to Horizon 2.0 connection is closed after a few minutes. We didn't have this problem with version 1.x. There is an Nginx loadbalancer inbetween, but because v1.x didn't have any problems and nothing is changed in the Nginx configuration this couldn't be the problem, right?

Chrome client console output

image

Log from Horizon server

2016-08-29T15:51:46.017499561Z debug: Client connection established.
2016-08-29T15:51:47.581220413Z debug: Received handshake: {"request_id":0,"method":"unauthenticated"}
2016-08-29T15:51:47.581754381Z debug: Sending response: {"token":"TOKEN","id":null,"provider":"unauthenticated","request_id":0}
2016-08-29T15:51:47.676723264Z debug: Received request from client: {"request_id":1,"type":"subscribe","options":{"collection":"events","order":[["date"],"ascending"],"limit":10}}
2016-08-29T15:51:47.686801051Z debug: Sending response: {...}
2016-08-29T15:51:47.689840945Z debug: Sending response: {...}
2016-08-29T15:51:47.695491620Z debug: Sending response: {"state":"synced","request_id":1}
2016-08-29T15:52:47.583685974Z debug: Received request from client: {"request_id":2,"type":"keepalive"}
2016-08-29T15:52:47.583787242Z debug: Sending response: {"state":"complete","request_id":2}
2016-08-29T15:53:47.597633888Z debug: Received request from client: {"request_id":3,"type":"keepalive"}
2016-08-29T15:53:47.597770435Z debug: Sending response: {"state":"complete","request_id":3}
2016-08-29T15:54:47.608142765Z debug: Received request from client: {"request_id":4,"type":"keepalive"}
2016-08-29T15:54:47.608195623Z debug: Sending response: {"state":"complete","request_id":4}
2016-08-29T15:55:47.617367351Z debug: Client connection terminated.
Server version:

2.0.0

Client version:

2.0.0

encryptio commented 8 years ago

Horizon 1.x used engine.io, and Horizon 2.0 switched to using websockets directly; this means there's no fallback to polling when the websocket connection messes up.

I think this is caused by the load balancer you have timing out connections - https://github.com/rethinkdb/horizon/issues/527 added keepalives inside the websocket connection once every 60 seconds, but it looks like nginx's default proxy_read_timeout is also 60 seconds, so nginx will give up just before the keepalive gets to it.

You can adjust the proxy_read_timeout up in nginx, and that will likely solve the issue.

We also might want to lower horizon's keepalive timeout to be below 60 seconds to avoid more issues like this in the future... I'll leave that decision to others.

deontologician commented 8 years ago

You can also modify the keepalive in your Horizon constructor Horizon({keepalive: 15}) if you want

arthurvi commented 8 years ago

Thank you! I will test the proposed solutions and report back.

arthurvi commented 8 years ago

I found setting keepalive to 50 solved the problem without adding too much overhead. Thanks!

new Horizon({
  keepalive: 50
});
deontologician commented 8 years ago

I'd say that's good evidence we should bump down the default

On Mon, Aug 29, 2016, 23:57 Arthur Visser notifications@github.com wrote:

Closed #799 https://github.com/rethinkdb/horizon/issues/799.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/rethinkdb/horizon/issues/799#event-771708741, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAFVlw6O1qR0oH3hXEg379Mrg9Qw8WGks5qk9RKgaJpZM4Jvwss .

SimulatedGREG commented 8 years ago

Just wanted to insert my feedback to say changing proxy_read_timeout in nginx to 80 and setting @horizon/client's keepalive to 50 resolved my disconnection issues. Hope this helps somebody out there. 😄

seddonm1 commented 8 years ago

Also updating rxjs to rc.2 allows reconnection to work. I have had to manually resubscribe connections for reliability but seems ok so far.