Support `closeConnectionTo`

haskell-distributed / distributed-process

Cloud Haskell core libraries

http://haskell-distributed.github.io

713 stars 97 forks source link

Support `closeConnectionTo` #409

Open mboes opened 9 years ago

mboes commented 9 years ago

From @edsko on September 24, 2012 12:54

which closes the entire "bundle" of (outgoing and incoming) connections to another endpoint. Basically, "disconnect completely from this other endpoint" (a "heavyweight disconnect").

Once this is implemented we can resolve a TODO in Control.Distributed.Process.Node.

Copied from original issue: haskell-distributed/distributed-process#37

mboes commented 9 years ago

From @netogallo on December 16, 2012 21:24

Hi, I am trying to get involved in the development of CH and thought about starting with this issue. I just wanted to ask if I understand the issue at a technical level:

Each Node has an EndPoint (localEndPoint) and every time newLocalNode is called, a new endpoint is created for that node so there is one node for every endpoint (even though it is theoretically possible to have multiple nodes for a single endpoint using the createBareLocalNode function). So the issue consists of implementing a function closeConnectionsTo which given a node it should call the close function on all connections in the LocalNodeState? I know nodes and endpoints are different, but I don't find any place where endpoints and connections opened with the endpoint can be associated except the node to which the endpoint belongs. If some light could be shed here would be nice.

Thank you

mboes commented 9 years ago

From @edsko on December 17, 2012 8:29

No, closeConnectionTo is a purely Network.Transport level construct. You'd be working with Network.Transport and Network.Transport.TCP. You'd also have to think very carefully about how this construct would work in potentially other network transports. This one isn't particularly easy.

mboes commented 9 years ago

From @hyperthunk on December 18, 2012 11:34

So @edsko it sounds like what we're asking for is a means of tracking all the connections associated with a specific Endpoint and having a kind of bulk-close. And that needs to work at the transport level.

I think the abstraction isn't too hard to get right. If you move the mechanism into the policy layer (i.e., Network.Transport API) and basically provide the data structures and manipulation logic there, then the utility layer (e.g., Network.Transport.Foo or whatever) can simply call into these.

Better still, move the API calls into the policy layer itself and parameterise them by the implementation itself. Say for example, that we have some type class (I know, I'm obsessed with these right!) that we can work with. Somewhere in Network.Transport we might see this kind of code...

type CreateTransportResult = Either (TransportError NewEndPointErrorCode) EndPoint

data TransportInternals a = TransportInternals a

class TransportProvider a where    
  createNew :: a -> TransportConfiguration -> IO CreateTransportResult
  getInternals  :: a -> Transport -> TransportInternals a

createTransport :: (TransportProvider a) => a -> TransportConfiguration -> IO CreateTransportResult
createTransport provider = createNew provider config

Now you can do something similar (with or without the type classes) so that the implementations have to provide some sort of EndPointProvider as well. The key to this is to make sure that the functions which deal with an endpoint are defined in the policy layer, where we can handle tracking them properly. Quite where we choose to maintain this state is debatable. We could use shared memory or go some other way, e.g.,

data EndPoint = EndPoint {
    provider :: EndPointProvider   -- provides the callback functions
  , address :: EndPointAddress
  , connections :: MVar (StrictList Connection)
    --- etc
  }

connectEndpoint :: EndPoint -> Reliability -> ConnectHints -> IO (Either (TransportError ConnectErrorCode) Connection)
connectEndpoint ep r h = do
  addr <- address ep
  impl <- provider ep
  -- let the backend do the real world
  conn <- backendConnect impl addr r h
  case conn of
      Left _ -> return conn
      Right cn -> appendSTM (connections ep) cn >> return conn

disconnectEndpoint :: EndPoint -> IO ()   --- probably want some return value but hey...
disconnectEndpoint ep = do
  conns <- connections ep
  foldrSTM conns closeConnection   -- we can collect the outcomes here...
  writeSTM conns newEmptyStrictList

Anyway, that's just a sketch, but hopefully you see my point. By moving some of the mechanism into Network.Transport we can handle these kinds of things in that layer, and leave the particulars of mechanism and utility (where you're dealing with specific implementation details) to the backends.

So gentlemen - how does that sound? Certainly it's a bigger refactoring than just implementing closeConnectionTo but does it seem like the right approach in principle?

And if so, how do you feel about picking this up and making a start @netogallo? We'll step in and help as needed of course, as and when we're able to.

mboes commented 9 years ago

From @edsko on December 18, 2012 11:43

No, I don't think that's the right way to go. You need to add closeConnectionTo to Network.Transport alright, but nothing else. None of the above data structures should go there, because different transport implementations might implemented this in entirely different ways. For instance, Network.Transport.TCP goes to great lengths to make sure there is at most a single TCP connection between any two endpoints. So, in that case, closeConnectionTo is simply a matter of closing that socket, and dealing with the fallout.

Network.Transport is tiny -- it is just the API specification really, and it should probably stay that way.

You are welcome to try and modify Network.Transport.TCP but be aware that it is a very complicated piece of code -- at least, by my standards. Probably the hardest bit of code I've ever written. I don't think closeConnectionTo should be too difficult to add, but the issues are subtle so I can't say for sure.

mboes commented 9 years ago

From @edsko on December 18, 2012 11:50

As regards getting the abstraction right -- you will need to talk to people involved with other kinds of transports. Is it realistic to demand something like closeConnectionTo in a UDP transport? (Probably.) What about Infiniband (CCI)? (Less sure there.)

mboes commented 9 years ago

From @hyperthunk on December 18, 2012 11:51

None of the above data structures should go there, because different transport implementations might implemented this in entirely different ways. For instance, Network.Transport.TCP goes to great lengths to make sure there is at most > a single TCP connection between any two endpoints

Hmn, good point, I hadn't thought about that. Clearly it's much better for Network.Transport.TCP to just close the socket, so folding over the lightweight connections and closing them is pointless.

I don't think closeConnectionTo should be too difficult to add, but the issues are subtle so I can't say for sure.

Well if your comment about just closing the socket is right then it doesn't sound too onerous. Where would you expect the API to live? On EndPoint like so?

data EndPoint = EndPoint {
  receive :: IO Event
  --- snip
  disconnectEndpoint :: IO ()

What is the different between closeConnectionTo and closeEndPoint :: IO () that already exists? Is it because basically the other providers might want to tear their connections down without closing the endpoint itself? What's the use-case for that? The TODO in Node.hs says that you're completely disconnecting from the EndPoint because of the error, if I'm reading it right.

It does sound like for TCP that closeConnectionTo == closeEndPoint anyway, perhaps with some added error/exception handling.

mboes commented 9 years ago

From @hyperthunk on December 18, 2012 11:53

As regards getting the abstraction right -- you will need to talk to people involved with other kinds of transports. Is it realistic to demand something like closeConnectionTo in a UDP transport? (Probably.) What about Infiniband (CCI)? (Less sure there.)

Yes indeed.

What's the use-case for that?

Just realised - you're closing the connections to the other endpoint but not closing the local end!

mboes commented 9 years ago

From @edsko on December 18, 2012 11:58

Just realised - you're closing the connections to the other endpoint but not closing the local end!

Yes, this is not closeEndpoint. The Network.Transport API has this concept of a 'bundle' of connections between two endpoints. closeConnectionTo closes such a bundle in one go, without closing the endpoints themselves or connections they might have to other endpoints.

Note that that "in one go" is in serious need of expansion. How synchronous is this operation? What about data currently in the network? Etc. Etc. These are not easy questions to answer.

mboes commented 9 years ago

From @netogallo on December 18, 2012 12:12

Implementing such function does seem to require careful consideration since users will be expecting somewhat homogeneous behavior in all transports which might be tricky. I will probably return to this issue once I have better understanding about transports, especially since I am not a networks guy. Thanks for the valuable feedback, I believe this discussion will be very useful to tackle the problem, but for the moment I will review CH a little more to figure out where my skills can be most useful. Thank you for the feedback.

mboes commented 9 years ago

From @hyperthunk on December 18, 2012 13:41

@edsko - we deal with these kinds of problems every day at RabbitMQ and they are indeed very tricky to get right. In fact, I'd say I spend > 30% of my engineering efforts hardening the broker against situations that arise because of unreliable networks, and that's just considering TCP/IP.

The problem with making this operation synchronous is that it can block for an arbitrary period of time, depending on the semantics of the underlying network/transport/protocol/etc. That's not good news when you're trying to shut down, upgrade, restart or whatever. One solution is timeouts, but then you're never sure just how long they should be and of course this should be a matter of policy. For connected protocols like TCP, using heartbeats is usually the right solution, with configurable parameters to control the frequency. Even for disconnected protocols this is usually a good idea.

Making this operation asynchronous is fine, in theory, but there would need to be some kind of completion callback or result on which the caller can block if required.

mboes commented 9 years ago

From @edsko on December 18, 2012 13:45

Yes, agreed on all the above. My main point is: don't rush into this one :)

avieth commented 7 years ago

I believe it makes sense to include closeConnectionTo :: EndPoint -> EndPointAddress -> IO () in network-transport. It's not about freeing up resources, it's about controlling the event queue. Its semantics are all about what comes out of the EndPoint's event queue, and I believe concrete transports should have no trouble meeting the requirements.

The term is closeConnectionTo ep addr.

An open connection is identified by a connid :: ConnectionId such that a ConnectionOpened connid _ addr has been enqueued at ep but ConnectionClosed connid nor EventConnectionLost addr has been enqueued afterwards.

For every open connection, no Received connid _ events may be enqueued at ep after closeConnectionTo ep addr returns. An EventConnectionLost addr must be enqueued at ep before any ConnectionOpened _ _ addr is enqueued there.

closeConnectionTo is a means of stopping all further Received events from a given peer, until either party tries to connect to the other again. It doesn't matter how the concrete transport is implemented: sometimes closeConnectionTo can free up some resources (TCP), sometimes it can't (UDP), or maybe it's irrelevant because both clients have the same heap (inmemory). The feature is about control of events rather than resources; the implications for resources are merely to be taken advantage of by some concrete transports. To closeConnectionTo is to demand that no more bytes will come through with the connection identifiers for currently open connections, and that's a useful tool regardless of what happens under the surface.

It may be enlightening to bring up issue haskell-distributed/distributed-process#405. If that were implemented, then the absence of closeConnectionTo would be a great asymmetry: any EndPoint can cap the local-provenance events for a given lightweight connection (by calling close) but has absolutely no control over the remote-provenance events. Bringing in closeConnectionTo makes it slightly better: an EndPoint has the ability to sever all incoming connections from a given peer at will (guarantee that no more Data Peer events appear). It's coarser, but at least the capability is there.