Channel vs Process based GenServer

hyperthunk commented 12 years ago

I don't know if a full split is necessary or even a good idea, but for handleInfo and in particular in order to deal with messages sent to the process by monitors, we cannot be entirely oriented around typed channels.

rodlogic commented 12 years ago

Yes, this is an important topic for discussions since it could affect the core GenServer design. I don't have a strong opinion about this either way at this point so I need to do some homework.

In the interest of gathering some additional data for my own understanding, here is a dull search for handle_info usage in Riak's code base (distinct tags: inet_async, 'EXIT', _ReqId, gen_event_EXIT, 'DOWN', tcp_closed, tcp_error, tcp, ssl_closed, ssl_error, ssl, nodeup, nodedown):

https://gist.github.com/9aca21d61fdb5eecf0a6

And here for RabbitMQ's server code base (distinct tags: 'DOWN', nodeup, new_pg2_fixed, 'EXIT', bump_credit, mnesia_system_event, delayed_restart, and inet_async):

https://gist.github.com/fedcf22f077d592df7a7

It is a rather simplistic view on handle_info usage, but gives an idea of how and how much it is used.

From the above sample handle_info messages, which ones are really dynamics ones for which the implementors had no a priori knowledge? And which ones are just part of a "secondary" protocol, but that could be as typed as the primary protocol messages?

Or is it that handle_info messages are part of a "common" protocol that is common across a set of disparate processes and should be defined once and reused many times? Even so I could see these implemented either as process or channel messages.

This bring me back to the point I wade earlier that a server is a collection of services and these services could be either calls or casts/info and be newly defined by the designer of the server and/or reused from other server/modules.

E.g. to illustrate the point:

Server: Counter
    service1: call(count :: () -> Int)
    service2 : cast(reset :: () -> ())                -- this is a new service for counter
    imported service2: cast(NodeDown -> ())  -- this is the piece that could be defined in a node module

Behind the scenes it seems that they could well be implemented with expect/receiveWait or receiveChan, but I am not sure if both are supported in CloudHaskell.

rodlogic commented 12 years ago

@edsko Could you chip in to give us an idea of Cloud Haskell's design wrt to channels vs process messages here?

First, can one reliably use both at the same time to receive messages? I see in the API that it is possible to merge receive ports for channels and that it is possible to receiveWait on many types of messages, but could both receiveChan and receiveWait be used at the same time by a single process? Or the design assumes a process implementation has to choose one over the other?

A second, question I have is wrt to mutiple channels in a single process. When creating a process with spawnLocal the returned ProcessId can be used to send any message type, so no complications here. The spawnChannelLocal(), however, returns a single SendPort. Does the API assume that this SendPort should be used to 'discover' or retrieve additional ports from the Process? I am curious if this was motivated by a explicit design decision that could invalidate the whole idea of using multiple typed channels for a single process.

hyperthunk commented 12 years ago

And here for RabbitMQ's server code base (distinct tags: 'DOWN', nodeup, new_pg2_fixed, 'EXIT', bump_credit, mnesia_system_event, delayed_restart, and inet_async):

In RabbitMQ, like most erlang code bases we use 'EXIT" and 'DOWN' extensively. Cloud Haskell doesn't appear to support 'EXIT' signals in the same way erlang does, but I don't want to dwell on that here as it's a big(ish) conversation and I'd like to park it until we're looking at supervision trees in earnest. The {'DOWN', MonitorRef, process, Pid, ExitReason} tuple is what monitors deliver. In Cloud Haskell monitor signals can only be consumed by expect/receive* afaict, not via channels.

A gen_event_EXIT signal is given when a supervised gen_event handler crashes, and is used to provide restartable event handlers (see https://github.com/hyperthunk/nodewatch/blob/master/dxkit/src/dxkit_event_handler_bridge.erl for a very simple example of how this works in practice to force a restart via the parent supervisor). The nodeup | nodedown messages are what node monitors deliver.

The only other thing that isn't Rabbit specific there is inet_async. In erlang, sockets are implemented as linked in drivers and data from sockets is delivered to the socket's controlling process as messages. In RabbitMQ the socket writer process needs to write a lot of data out quickly and the gen_tcp:send/2 API call writes to the port driver and then blocks on a selective receive waiting for inet_async to indicate completion, which is bad in our case because the writer process may have a very large message queue that subsequently has to be scanned looking for the inet_async response.

From the above sample handle_info messages, which ones are really dynamics ones for which the implementors had no a priori knowledge? And which ones are just part of a "secondary" protocol, but that could be as typed as the primary protocol messages?

So the answer is there's a mix.

Or is it that handle_info messages are part of a "common" protocol that is common across a set of disparate processes and should be defined once and reused many times? Even so I could see these implemented either as process or channel messages.

I think you're missing the point of handle_info. It can be used to deal with secondary protocols and often is, but its primary purpose is to allow the gen server to deal with unexpected traffic. If you don't do something with messages sent to the process then they just fill up your mailbox (which has performance implications) and because the gen server abstraction is meant to manage the mailbox on behalf of the server - which deals with the functional aspects only - we can't just ignore the fact that unsolicited mail could arrive.

First, can one reliably use both at the same time to receive messages? I see in the API that it is possible to merge receive ports for channels and that it is possible to receiveWait on many types of messages, but could both receiveChan and receiveWait be used at the same time by a single process? Or the design assumes a process implementation has to choose one over the other?

That's an interesting question; I'm guessing you've got to choose one or the other.

hyperthunk commented 12 years ago

In RabbitMQ, like most erlang code bases we use 'EXIT" and 'DOWN' extensively. Cloud Haskell doesn't appear to support 'EXIT' signals in the same way erlang does

I'd better quantify that briefly though, so @edsko doesn't think I've gone nuts. What I mean is that Cloud Haskell seems to have a different take on trapping exits (and I'm not sure about sending 'EXIT' signals to other processes without killing yourself either) which makes the design of various things surprisingly different. But as I said above, we can discuss that later.

hyperthunk commented 12 years ago

Oh and @rodlogic

This bring me back to the point I wade earlier that a server is a collection of services and these services could be either calls or casts/info and be newly defined by the designer of the server and/or reused from other server/modules.

E.g. to illustrate the point:

Server: Counter service1: call(count :: () -> Int) service2 : cast(reset :: () -> ()) -- this is a new service for counter imported service2: cast(NodeDown -> ()) -- this is the piece that could be defined in a node module

I'm certain this will all look very pretty once we're done and require minimal wiring on the part of server authors. Next steps are to make some decisions about the segregation (or not) of channels and regular messages, make the code testable and reliable, then performance. Finally, usability comes into play, hopefully without sacrificing any of the others. If we can make design decisions now that aid all of those concerns at once, then I'm all for that.

rodlogic commented 12 years ago

Ok, considering that there are use cases for untyped handle_info messages AND the fact that it is not possible to handle both channel and process messages at the same time (this based on a scan of Cloud Haskell's API), we should probably put the idea of typed channels aside for now and focus on process messages so we can move on.

If there is a need in the future to leverage typed channels and assuming they can co-exist with process messages, we can always re-evaluate. Hopefully, the public GenServer API should hide most of that from it's users anyway.

What do you think? We revert some of the recent changes to process messaging and go through a few iterations on the base GenServer API + some tests.

edsko commented 12 years ago

First, can one reliably use both at the same time to receive messages? I see in the API that it is possible to merge receive ports for channels and that it is possible to receiveWait on many types of messages, but could both receiveChan and receiveWait be used at the same time by a single process? Or the design assumes a process implementation has to choose one over the other?

The interplay between typed channels and process messages was not considered in the original paper. I have proposed one way to deal with this by introducing a new primitive

expectChan :: Serializable a => Process (ReceivePort a)

which would make it possible to receive messages of a certain type as a receive port, which can then be merged with other receive ports. The implementation of expectChan it not entirely trivial though and some of the spec needs to be fleshed out (for instance, presumably it should still be possible to receive those messages using a normal expect as well).

So as things stand if you want to wait for either a MonitorNotification or some other message then that other message cannot arrive on a typed channel. So you can either try to implement expectChan or not use channels for now.

edsko commented 12 years ago

A second, question I have is wrt to mutiple channels in a single process. When creating a process with spawnLocal the returned ProcessId can be used to send any message type, so no complications here. The spawnChannelLocal(), however, returns a single SendPort. Does the API assume that this SendPort should be used to 'discover' or retrieve additional ports from the Process? I am curious if this was motivated by a explicit design decision that could invalidate the whole idea of using multiple typed channels for a single process.

No, it just captures a common pattern: we want to start a server and send it requests on its SendPort.

The primary reason, in fact, is that the implementation is not trivial: the channel must be created on the remote process (because the ReceivePort is cannot be shipped from one node to another).

edsko commented 12 years ago

I'd better quantify that briefly though, so @edsko doesn't think I've gone nuts. What I mean is that Cloud Haskell seems to have a different take on trapping exits (and I'm not sure about sending 'EXIT' signals to other processes without killing yourself either) which makes the design of various things surprisingly different. But as I said above, we can discuss that later.

This is also in large part due to the fact that we use the "Unified Semantics for Future Erlang" semantics, rather than Erlang's semantics, which makes some crucial changes to some of these things. In particular, many more primitives are asynchronous, and linking is unidirectional.

We have talking a lot about this between Duncan Coutts, Simon Peyton Jones, Francesco Cesarini (from Erlang solutions) and my (personal) take on all this, given lots of examples, is that trapping exceptions is fraught with difficulties: if you find yourself in a position where you want to trap an exit signal (or, in Haskell parlor, catch an exception) you should consider using monitoring instead.

edsko commented 12 years ago

I think you're missing the point of handle_info. It can be used to deal with secondary protocols and often is, but its primary purpose is to allow the gen server to deal with unexpected traffic. If you don't do something with messages sent to the process then they just fill up your mailbox (which has performance implications) and because the gen server abstraction is meant to manage the mailbox on behalf of the server - which deals with the functional aspects only - we can't just ignore the fact that unsolicited mail could arrive.

Note that Cloud Haskell does provide matchUnknown which can be used to throw away (but not process) messages of unknown type, which you could use for that purpose. If that is not good enough, you might want to use matchAny, but then we are back to having to extend the AbstractMessage API (see https://github.com/haskell-distributed/distributed-process/issues/30 and https://github.com/hyperthunk/distributed-process-platform/issues/4).

hyperthunk commented 12 years ago

@edsko thanks for the clarifications, they're very helpful. I might have a crack at extending AbstractMessage at some point.

rodlogic commented 11 years ago

fyi:See this pull request by Simon Marlow:

https://github.com/haskell-distributed/distributed-process/commit/847abf494233523dba7d0b40628c3af9e870be91

It seems to address the issue of efficiently receiving channel and process messages.

haskell-distributed / distributed-process-platform

Channel vs Process based GenServer #7