haskell-distributed / distributed-process

Cloud Haskell core libraries
http://haskell-distributed.github.io
709 stars 96 forks source link

CH or NT must implement keep-alive #408

Open mboes opened 9 years ago

mboes commented 9 years ago

From @edsko on September 24, 2012 12:38

When one CH process A monitors another B, it expects to be notified if the connection between them breaks, even when A never sends anything to B (but only receives messages from B). This means that it is not enough to rely on send to detect network problems. This can be solved at the CH level or at the NT level.

Copied from original issue: haskell-distributed/distributed-process#32

mboes commented 9 years ago

From @edsko on November 18, 2012 11:40

http://nitoprograms.blogspot.co.uk/2009/05/detection-of-half-open-dropped.html might provide some useful insights.

mboes commented 9 years ago

From @hyperthunk on December 13, 2012 20:49

See http://rabbitmq.1065348.n5.nabble.com/Re-The-rabbitmq-server-stop-command-hangs-td23180.html for an example of this happening in another network protocol. I have some thoughts about this. Firstly, it can be solved at the OS level by tuning kernel params and/or switching on TCP keep-alive, but that only works for TCP.

Personally I think this should be handled at the NT level and that it should be configurable.

mboes commented 9 years ago

From @edsko on December 13, 2012 21:38

Personally I think this should be handled at the NT level and that it should be configurable.

Agreed.

mboes commented 9 years ago

From @hyperthunk on December 14, 2012 0:28

We do this in RabbitMQ so I've some familiarity with the problem space. I'll take a look at submitting a patch, but things are rather busy at the moment so it might not materialise for a week or so.

mboes commented 9 years ago

From @hyperthunk on December 17, 2012 2:3

so it might not materialise for a week or so.

Nah, it's going to be quite a bit longer than that before I even start thinking about this one. Still interested in picking it up though => assigning to myself unless someone else wants to come and steal it first. :)