Open mboes opened 9 years ago
From @netogallo on December 16, 2012 21:24
Hi, I am trying to get involved in the development of CH and thought about starting with this issue. I just wanted to ask if I understand the issue at a technical level:
Each Node has an EndPoint (localEndPoint) and every time newLocalNode is called, a new endpoint is created for that node so there is one node for every endpoint (even though it is theoretically possible to have multiple nodes for a single endpoint using the createBareLocalNode function). So the issue consists of implementing a function closeConnectionsTo which given a node it should call the close function on all connections in the LocalNodeState? I know nodes and endpoints are different, but I don't find any place where endpoints and connections opened with the endpoint can be associated except the node to which the endpoint belongs. If some light could be shed here would be nice.
Thank you
From @edsko on December 17, 2012 8:29
No, closeConnectionTo
is a purely Network.Transport
level construct. You'd be working with Network.Transport
and Network.Transport.TCP
. You'd also have to think very carefully about how this construct would work in potentially other network transports. This one isn't particularly easy.
From @hyperthunk on December 18, 2012 11:34
So @edsko it sounds like what we're asking for is a means of tracking all the connections associated with a specific Endpoint
and having a kind of bulk-close. And that needs to work at the transport level.
I think the abstraction isn't too hard to get right. If you move the mechanism into the policy layer (i.e., Network.Transport
API) and basically provide the data structures and manipulation logic there, then the utility layer (e.g., Network.Transport.Foo
or whatever) can simply call into these.
Better still, move the API calls into the policy layer itself and parameterise them by the implementation itself. Say for example, that we have some type class (I know, I'm obsessed with these right!) that we can work with. Somewhere in Network.Transport
we might see this kind of code...
type CreateTransportResult = Either (TransportError NewEndPointErrorCode) EndPoint
data TransportInternals a = TransportInternals a
class TransportProvider a where
createNew :: a -> TransportConfiguration -> IO CreateTransportResult
getInternals :: a -> Transport -> TransportInternals a
createTransport :: (TransportProvider a) => a -> TransportConfiguration -> IO CreateTransportResult
createTransport provider = createNew provider config
Now you can do something similar (with or without the type classes) so that the implementations have to provide some sort of EndPointProvider
as well. The key to this is to make sure that the functions which deal with an endpoint are defined in the policy layer, where we can handle tracking them properly. Quite where we choose to maintain this state is debatable. We could use shared memory or go some other way, e.g.,
data EndPoint = EndPoint {
provider :: EndPointProvider -- provides the callback functions
, address :: EndPointAddress
, connections :: MVar (StrictList Connection)
--- etc
}
connectEndpoint :: EndPoint -> Reliability -> ConnectHints -> IO (Either (TransportError ConnectErrorCode) Connection)
connectEndpoint ep r h = do
addr <- address ep
impl <- provider ep
-- let the backend do the real world
conn <- backendConnect impl addr r h
case conn of
Left _ -> return conn
Right cn -> appendSTM (connections ep) cn >> return conn
disconnectEndpoint :: EndPoint -> IO () --- probably want some return value but hey...
disconnectEndpoint ep = do
conns <- connections ep
foldrSTM conns closeConnection -- we can collect the outcomes here...
writeSTM conns newEmptyStrictList
Anyway, that's just a sketch, but hopefully you see my point. By moving some of the mechanism into Network.Transport
we can handle these kinds of things in that layer, and leave the particulars of mechanism and utility (where you're dealing with specific implementation details) to the backends.
So gentlemen - how does that sound? Certainly it's a bigger refactoring than just implementing closeConnectionTo
but does it seem like the right approach in principle?
And if so, how do you feel about picking this up and making a start @netogallo? We'll step in and help as needed of course, as and when we're able to.
From @edsko on December 18, 2012 11:43
No, I don't think that's the right way to go. You need to add closeConnectionTo
to Network.Transport
alright, but nothing else. None of the above data structures should go there, because different transport implementations might implemented this in entirely different ways. For instance, Network.Transport.TCP
goes to great lengths to make sure there is at most a single TCP connection between any two endpoints. So, in that case, closeConnectionTo
is simply a matter of closing that socket, and dealing with the fallout.
Network.Transport
is tiny -- it is just the API specification really, and it should probably stay that way.
You are welcome to try and modify Network.Transport.TCP
but be aware that it is a very complicated piece of code -- at least, by my standards. Probably the hardest bit of code I've ever written. I don't think closeConnectionTo
should be too difficult to add, but the issues are subtle so I can't say for sure.
From @edsko on December 18, 2012 11:50
As regards getting the abstraction right -- you will need to talk to people involved with other kinds of transports. Is it realistic to demand something like closeConnectionTo
in a UDP transport? (Probably.) What about Infiniband (CCI)? (Less sure there.)
From @hyperthunk on December 18, 2012 11:51
None of the above data structures should go there, because different transport implementations might implemented this in entirely different ways. For instance, Network.Transport.TCP goes to great lengths to make sure there is at most > a single TCP connection between any two endpoints
Hmn, good point, I hadn't thought about that. Clearly it's much better for Network.Transport.TCP
to just close the socket, so folding over the lightweight connections and closing them is pointless.
I don't think closeConnectionTo should be too difficult to add, but the issues are subtle so I can't say for sure.
Well if your comment about just closing the socket is right then it doesn't sound too onerous. Where would you expect the API to live? On EndPoint
like so?
data EndPoint = EndPoint {
receive :: IO Event
--- snip
disconnectEndpoint :: IO ()
What is the different between closeConnectionTo
and closeEndPoint :: IO ()
that already exists? Is it because basically the other providers might want to tear their connections down without closing the endpoint itself? What's the use-case for that? The TODO in Node.hs
says that you're completely disconnecting from the EndPoint because of the error, if I'm reading it right.
It does sound like for TCP that closeConnectionTo == closeEndPoint
anyway, perhaps with some added error/exception handling.
From @hyperthunk on December 18, 2012 11:53
As regards getting the abstraction right -- you will need to talk to people involved with other kinds of transports. Is it realistic to demand something like closeConnectionTo in a UDP transport? (Probably.) What about Infiniband (CCI)? (Less sure there.)
Yes indeed.
What's the use-case for that?
Just realised - you're closing the connections to the other endpoint but not closing the local end!
From @edsko on December 18, 2012 11:58
Just realised - you're closing the connections to the other endpoint but not closing the local end!
Yes, this is not closeEndpoint
. The Network.Transport
API has this concept of a 'bundle' of connections between two endpoints. closeConnectionTo
closes such a bundle in one go, without closing the endpoints themselves or connections they might have to other endpoints.
Note that that "in one go" is in serious need of expansion. How synchronous is this operation? What about data currently in the network? Etc. Etc. These are not easy questions to answer.
From @netogallo on December 18, 2012 12:12
Implementing such function does seem to require careful consideration since users will be expecting somewhat homogeneous behavior in all transports which might be tricky. I will probably return to this issue once I have better understanding about transports, especially since I am not a networks guy. Thanks for the valuable feedback, I believe this discussion will be very useful to tackle the problem, but for the moment I will review CH a little more to figure out where my skills can be most useful. Thank you for the feedback.
From @hyperthunk on December 18, 2012 13:41
@edsko - we deal with these kinds of problems every day at RabbitMQ and they are indeed very tricky to get right. In fact, I'd say I spend > 30% of my engineering efforts hardening the broker against situations that arise because of unreliable networks, and that's just considering TCP/IP.
The problem with making this operation synchronous is that it can block for an arbitrary period of time, depending on the semantics of the underlying network/transport/protocol/etc. That's not good news when you're trying to shut down, upgrade, restart or whatever. One solution is timeouts, but then you're never sure just how long they should be and of course this should be a matter of policy. For connected protocols like TCP, using heartbeats is usually the right solution, with configurable parameters to control the frequency. Even for disconnected protocols this is usually a good idea.
Making this operation asynchronous is fine, in theory, but there would need to be some kind of completion callback or result on which the caller can block if required.
From @edsko on December 18, 2012 13:45
Yes, agreed on all the above. My main point is: don't rush into this one :)
I believe it makes sense to include closeConnectionTo :: EndPoint -> EndPointAddress -> IO ()
in network-transport. It's not about freeing up resources, it's about controlling the event queue. Its semantics are all about what comes out of the EndPoint
's event queue, and I believe concrete transports should have no trouble meeting the requirements.
The term is closeConnectionTo ep addr
.
An open connection is identified by a connid :: ConnectionId
such that a ConnectionOpened connid _ addr
has been enqueued at ep
but ConnectionClosed connid
nor EventConnectionLost addr
has been enqueued afterwards.
For every open connection, no Received connid _
events may be enqueued at ep
after closeConnectionTo ep addr
returns. An EventConnectionLost addr
must be enqueued at ep
before any ConnectionOpened _ _ addr
is enqueued there.
closeConnectionTo
is a means of stopping all further Received
events from a given peer, until either party tries to connect to the other again. It doesn't matter how the concrete transport is implemented: sometimes closeConnectionTo
can free up some resources (TCP), sometimes it can't (UDP), or maybe it's irrelevant because both clients have the same heap (inmemory). The feature is about control of events rather than resources; the implications for resources are merely to be taken advantage of by some concrete transports. To closeConnectionTo
is to demand that no more bytes will come through with the connection identifiers for currently open connections, and that's a useful tool regardless of what happens under the surface.
It may be enlightening to bring up issue haskell-distributed/distributed-process#405. If that were implemented, then the absence of closeConnectionTo
would be a great asymmetry: any EndPoint
can cap the local-provenance events for a given lightweight connection (by calling close
) but has absolutely no control over the remote-provenance events. Bringing in closeConnectionTo
makes it slightly better: an EndPoint
has the ability to sever all incoming connections from a given peer at will (guarantee that no more Data Peer
events appear). It's coarser, but at least the capability is there.
From @edsko on September 24, 2012 12:54
which closes the entire "bundle" of (outgoing and incoming) connections to another endpoint. Basically, "disconnect completely from this other endpoint" (a "heavyweight disconnect").
Once this is implemented we can resolve a TODO in
Control.Distributed.Process.Node
.Copied from original issue: haskell-distributed/distributed-process#37