Enable GC of connections without losing ordering constraints

mboes commented 9 years ago

From @edsko on November 7, 2012 9:56

When process A sends a message to process B, we must open a connection from A to B and, currently, keep that connection open for the duration of A's lifetime in order to maintain ordering guarantees. If A sends messages to lots of different processes (think of a server responding to clients) this will result in a space leak.

What we need is a way to garbage collect connections when they are no longer used, but still maintain ordering guarantees. There are (at least) two ways in which we might implement this:

Client side. When we close the connection, we wait for an acknowledgement from the other side that the connection has been closed. Once we have received this acknowledgement we know that all messages we sent have been received and hence it's safe to open a new connection. If the sender starts sending more messages before receiving the acknowledgement this messages must be buffered.

We could implement this at the Cloud Haskell level, and have the node controller send the acknowledgement. This introduces the question "what about connections to the node controller itself". We already have implicit reconnect to and from node controllers, implying potential message loss, but what we don't want is reordering of messages to node controllers.

We could instead implement it at the Network.Transport level and introduce a new OutgoingConnectionClosed event, but there is a technical difficulty here too: outgoing connections don't have identifiers on the sender side, only on the receiver side, so we somehow need to be able identify to the sender which connection got closed, without making connect synchronous. (The sender allocates part of the receiver-side ID, but only half of it.) One easy solution is to introduce
```
closeWithAck :: Connection -> Int64 -> IO ()
```
so that we allow the sender to provide a token which will be returned in the OutgoingConnectionClosed event.
Server side. When we close the connection, we remember the connection ID (but see discussion above: we don't know connection IDs sender side -- similar solutions could be proposed here). Then when we open a new connection we first send a message saying "all messages you receive on this connection must be delivered after you have got all messages on connection X"; this implies server side buffering.

Of course, this by itself doesn't gain us much because now we still have to maintain state: the connection ID of the last connection to every process we ever sent a message to. So the question becomes when can we forget this state? This seems to require some sort of acknowledgement from the server, though.

Client side seems the easier way to solve this problem.

Copied from original issue: haskell-distributed/distributed-process#64

mboes commented 9 years ago

From @edsko on November 7, 2012 9:59

A related issue is when do we garbage collect connections. One option is to clear them periodically after some timeout period. Another possibility is to collect them when process A has no further references process B (using weak references somehow). The main difficulty there is that ProcessId at the moment is a stateless object, and must be as long as we have pure decoding (ProcessIds are serializable, after all), unless we resort to unsafePerformIO trickery.

mboes commented 9 years ago

From @edsko on November 7, 2012 12:50

Note that there is a manual workaround for people for whom this is an important issue: you can manually cleanup connections by using reconnect. This is described in more detail in a recent blog post.

facundominguez commented 8 years ago

Using usend instead of send avoids this problem, because usend does not create a connection per process pair. The problem would come up still at a different scale when having too many nodes.

haskell-distributed / distributed-process

Enable GC of connections without losing ordering constraints #411