supervision trees - Githubissues

hyperthunk commented 12 years ago

policy control

sibling management (1-4-1, 1-4-all, left/right neighbours)
seq/par shutdown (per branch)
timeout/kill (per process?)
exit policy (i.e., transient, temp, perm, etc)
restart intensity (limits)
alternating restart policies (exit type based, global, etc)
backoff/timeout in response to restart limits

questions...

how do we define the startup procedure?
what type does a start spec have???
how can we cleanly represent error signals in a generic way?

hyperthunk commented 12 years ago

On the point about start specs: the Supervisor process needs to spawn these other Processes, and must be able to restart them. It therefore needs to have the clousure information available for a start - is this enough? And do we force all code that starts a supervisor (or asks for another Process to be supervised) to be a Process too? More to the point, what about code that provides the child specifications themselves - does that have to be in the Process monad?

rodlogic commented 12 years ago

I don't have answers to your questions so far, but found this link useful (I am new to Erlang):

http://www.aosabook.org/en/riak.html

Would creating a set of data types describing a supervision tree, including a supervision and child specifications, be a good start?

Are supervision tree usually static (static wrt to supervisors not workers)?

hyperthunk commented 12 years ago

Yes, starting with the type definitions is exactly what I want to do. One of the challenges here is knowing how to distinguish between a normal and an abnormal exit condition. In erlang's supervisor, if a process exits with the atom 'normal' then it is considered a happy exit and is only restarted if the process restart policy is set to 'permanent'. Cloud Haskell's concept of exits appears somewhat different. Also, in Cloud Haskell, a link between the child and supervisor will cause both processes to exit, which isn't what we want. So, we need to use spawnMonitor, but the question is, should that be down to the start function the child spec points to, or down to the supervisor? I want to make this mechanism as user friendly as possible.

On the subject of types, how best to represent the child spec? Should it just be a thunk, with the last few arguments left to be filled in when it is started? Or should it provide complete transitive closure over all the arguments used to start the server each time, as passed by the initial caller?

rodlogic commented 12 years ago

After reading a bit about Erlang and supervision trees/supervisor module, I have a few questions/observations I would like to share (note that I don't have much real Haskell and Erlang experience, so take this with a grain of salt):

(general) behaviors + (user-defined) callbacks - how should we translate this notion to Haskell?

First of all, each behavior could be in it's own module. This makes sense since it's one API per module used potentially by all the other modules. So, we should have a Control.Distributed.Platform.Supervisor module exporting at least the following functions:

startLink
startChild
terminateChild
deleteChild
restartChild
whichChildren
countChildren
checkChildSpecs

Second, we could use type classes to represent callbacks. A type class specifies a well-defined interface that user-defined modules can implement. E.g.:

module Control.Distributed.Platform.Supervisor where

class Supervisor s where
    toSpec :: s -> Maybe (SupervisorSpec, [ChildSpec]) -- ^ toSpec is a rename of init

Here, s would be a typed version of Args. The problem here is that each different Supervisor will require a different s, but for now I'll ignore this.

Then, a user defined module could define a supervisor:

module MySupervisor where

newtype MySuper = MySuper () deriving Show

instance Supervisor MySuper where
    toSpec _ =  Just (SupervisorSpec OneForOne 1 60, [
                                       ChildSpec "Child1" () Permanent (Shutdown 60) Worker
                                      ])

And the generic Supervisor module would define startLink as:

startLink :: (Supervisor s) => s -> Process ()

I am not sure about Process () here, but it seems like a good choice since startLink would (should?) be called in the context of a Process. In any case, startLink would be implemented along these lines:

startLink s = do
    let spec = toSpec s
    (sid, sname) <- processSpec spec
    ....
    return ()

It seems that an Erlang module can implement more than one behavior (?). This also works well with type classes.
Should a behavior translate to monad a wraps the Process monad (this could get out of hand if multiple behaviors are needed)? Or should there be a single, parameterized monad for behaviors that wraps Process monad and works together with per behavior type classes?
Supervisor Data types. This seems to be most straightforward aside from specific refinements.

-- |  
data RestartStrategy
    = OneForOne 
    | RestForOne 
    | OneForAll
    | SimpleOneForOne
      deriving Show

-- | 
type AllowedRestarts = Int

-- |
type MaxSeconds = Int

-- | 
data SupervisorSpec = SupervisorSpec { 
      restartStrategy :: RestartStrategy, 
      allowedRestarts :: AllowedRestarts,
      maxSeconds :: MaxSeconds
    } deriving Show

-- |
type Id = String

-- |
data Shutdown 
    = Shutdown Int 
    | Infinity
      deriving (Show)

-- |
data Restart
    = Transient 
    | Temporary 
    | Permanent
      deriving Show

-- |
data ChildType 
    = Worker
    | Supervisor
      deriving Show

type ChildName = String

-- |
data ChildSpec = ChildSpec { 
      childName :: ChildName, 
      childCallback :: (), 
      childRestart :: Restart, 
      childShutdown :: Shutdown, 
      childType :: ChildType
    } deriving Show

how can we test these modules in isolation?

hyperthunk commented 12 years ago

Hi! Thanks for that feedback, it's very useful. Some thoughts inline:

First of all, each behavior could be in it's own module. This makes sense since it's one API per module used potentially by all the other modules. So, we should have a Control.Distributed.Platform.Supervisor module exporting at least the following functions:

startLink

That's pretty much what I was thinking.

Second, we could use type classes to represent callbacks. A type class specifies a well-defined interface that user-defined modules can implement. E.g.:

module Control.Distributed.Platform.Supervisor where

class Supervisor s where
    toSpec :: s -> Maybe (SupervisorSpec, [ChildSpec]) -- ^ toSpec is a rename of init

Here, s would be a typed version of Args. The problem here is that each different Supervisor will require a different > s, but for now I'll ignore this.

Yes, that's a bit of a bind. It's very likely that you'll want to supervise numerous different Process instances that are started in different ways (i.e., using different args). For example, consider the network layer supervisor for RabbitMQ:

init({IPAddress, Port, SocketOpts, OnStartup, OnShutdown,
      AcceptCallback, ConcurrentAcceptorCount, Label}) ->
    Name = rabbit_misc:tcp_name(tcp_acceptor_sup, IPAddress, Port),
    {ok, {{one_for_all, 10, 10},
          [{tcp_acceptor_sup, {tcp_acceptor_sup, start_link,
                               [Name, AcceptCallback]},
            transient, infinity, supervisor, [tcp_acceptor_sup]},
           {tcp_listener, {tcp_listener, start_link,
                           [IPAddress, Port, SocketOpts,
                            ConcurrentAcceptorCount, Name,
                            OnStartup, OnShutdown, Label]},
            transient, 16#ffffffff, worker, [tcp_listener]}]}}.

If the elements in the Spec for the children that need to conform to a single type, then that is much harder. Perhaps we can work around this by making the user pass a ready made Closure? Whilst this seems a bit annoying, including the start args using (Serializable t1) => t1 isn't enough as we don't know what function to call from that alone.

And the generic Supervisor module would define startLink as:

startLink :: (Supervisor s) => s -> Process ()

I am not sure about Process () here, but it seems like a good choice since startLink would (should?) be called in the context of a Process.

Well startLink should definitely be called in the context of a process - the supervisor must have started up before it spawns any children, so that the link/monitor actions remain atomic and we avoid any nasty races in the (re)start sequence.

In any case, startLink would be implemented along these lines:

startLink s = do
    let spec = toSpec s
    (sid, sname) <- processSpec spec
    ....
    return ()

Surely we need to carry the s around with us in our process state, so we'd actually want something like startLink :: (Supervisor s) => s -> Process s instead?

It seems that an Erlang module can implement more than one behavior (?). This also works well with type classes.

I'm happy with the idea of using modules and type classes where we actually need something akin to behaviours, like the supervisor case.

Supervisor Data types. This seems to be most straightforward aside from specific refinements.

snip....

Yes, this is exactly right, bar (as you say) a few refinements.

edsko commented 12 years ago

Just a few brief remarks:

You mention that "Also, in Cloud Haskell, a link between the child and supervisor will cause both processes to exit". That is not the case. Linking in Cloud Haskell is unidirectional. So, in a supervisor tree, it makes sense for the child to link to the supervisor, and the supervisor to monitor the child.
Termination: If the supervisor is monitoring the child then it receives a monitor notification when the child exists; the monitor notification will inform the supervisor if this was a normal or abnormal termination, just like in Erlang.
I am not convinced that type classes are the way to go here. Perhaps consider packaging up the relevant functions in a record instead.
Leading on from the previous point, one thing one might consider if having pure "processes" instead, rather like the Map Reduce skeleton in my [http://www.well-typed.com/blog/73](blog post). This fits with the intended usage of Erlang behaviours, which are designed to separate out the "pure" and the "concurrent" parts; and of course, purity can be enforced by Haskell's type system. This leads to something akin to the "Task" layer in the original Cloud Haskell prototype.

hyperthunk commented 12 years ago

@edsko - thanks for this!

Just a few brief remarks:

You mention that "Also, in Cloud Haskell, a link between the child and supervisor will cause both processes to exit". That is not > the case. Linking in Cloud Haskell is unidirectional. So, in a supervisor tree, it makes sense for the child to link to the supervisor, > and the supervisor to monitor the child.

Ok that's a good point, but thankfully I was just being slow there. I had already assumed that startSupervised was doing that, so it looks like I'm on the right track there, at least in my head! :)

Termination: If the supervisor is monitoring the child then it receives a monitor notification when the child exists; the monitor > notification will inform the supervisor if this was a normal or abnormal termination, just like in Erlang.

Cool - I just noticed something in the tests to that effect, so I'll build on what I can see there.

I am not convinced that type classes are the way to go here. Perhaps consider packaging up the relevant functions in a record > instead.

I did think about doing that, but I wondered whether the proper type signature would be hard to figure out. Having said that, if we make the init/startup data a Closure then I guess this isn't a problem.

Leading on from the previous point, one thing one might consider if having pure "processes" instead, rather like the Map Reduce skeleton in my [http://www.well-typed.com/blog/73](blog post). This fits with the intended usage of Erlang behaviours, which are designed to separate out the "pure" and the "concurrent" parts; and of course, purity can be enforced by Haskell's type system. This leads to something akin to the "Task" layer in the original Cloud Haskell prototype.

Thanks for pointing that out. I'll go through and read it as well as looking at the Task layer ideas too.

rodlogic commented 12 years ago

Just a few brief remarks:

I am not convinced that type classes are the way to go here. Perhaps consider packaging up the relevant functions in a record > instead.

I did think about doing that, but I wondered whether the proper type signature would be hard to figure out. Having said that, if we make the init/startup data a Clojure then I guess this isn't a problem.

I also hit another issue with using type classes as the formal contract between the "pure" and "concurrent" layers: type inference can't figure out the proper message type. There is a chicken and egg problem here trying to infer the type.

My main motivation for considering type classes as the pure-concurrent contract was that it would become dead simple for anyone to implement a server/task since even the signatures are figured out for you: just plug your own types and provide an implementation.

Leading on from the previous point, one thing one might consider if having pure "processes" instead, rather like the Map Reduce skeleton in my http://www.well-typed.com/blog/73 http://blog%20post. This fits with the intended usage of Erlang behaviours, which are designed to separate out the "pure" and the "concurrent" parts; and of course, purity can be enforced by Haskell's type system. This leads to something akin to the "Task" layer in the original Cloud Haskell prototype.

Thanks for pointing that out. I'll go through and read it as well as looking at the Task layer ideas too.

While looking at Erlang's Supervisor code this weekend, I noticed that it is implemented in terms of the gen_server behavior, which enforces a well-defined client/server protocol between the pure and concurrent layer in a server (would a server in Erlang's world be the same as what you refer to as a 'Task'?). This seems to be a critical area that we should be focusing first, no?

The gen_server behavior in Erlang:

a server has a well-defined state-machine it goes through: it is spawn, initialized, loop and terminate
a server can return a timeout on initialization that will shutdown the process if no message is received with the specified time.
each server is implemented in a single Erlang module (convention)
a server module contains a public API part and internal callbacks for each of the phases above
a server has/can have state maintained across callbacks for the duration of the process.
a server supports one-way messages (simple send, no reply), a standard, synchronous request/reply, and a request/reply were the reply is either performed by another process or at another time.

hyperthunk commented 12 years ago

@rodlogic - you make a good point about the gen_server basis for OTP's supervisor. But....

The gen_server concept is quite different to implement in Haskell. Differentiating between for example handle_call, handle_cast and handle_info is done in OTP by the callback module providing implementations of these functions of the same name. Now how do you do this in Haskell - what is the type signature of those functions? Are you saying that the type signature will be handleCast :: (Typeable a, StateTransition b) => a -> b and what does the signature for handleCall look like, when we've got to return the state transition and the reply? What type does the reply have - is this just Typeable a or something? Because we know nothing about the messages that the server will receive in the general case, nor about the responses we're expected to handle sending back to the client.

So whilst you're right in saying that the gen_server concept of abstracting the concurrency model out from the implementation (how to do blocking versus non-blocking calls, timeouts, hibernation, etc), it's not at all clear to me what this would look like in Cloud Haskell.

Perhaps once I've read the links that @edsko posted it'll be clearer, and in general, yes I agree that if there is something like gen_server that would fit the bill for Cloud Haskell then we should build supervisors on top of this.

edsko commented 12 years ago

I think an approach based on a record tells people exactly what they need to implement; I really don't see the advantage of a type class here. For instance, consider the map reduce skeleton from my blog post that I mentioned earlier:

data MapReduce k1 v1 k2 v2 v3 = MapReduce {
    mrMap    :: k1 -> v1 -> [(k2, v2)]
  , mrReduce :: k2 -> [v2] -> v3
 }

Now when people give an instance of a map-reduce skeleton they simply say

example :: MapReduce <type variables>
example = MapReduce { mrMap = .. ; mrReduce = .. }

and the types are clear (in the map-reduce example is probably more generic than gen_server needs to be because the types of the internal results can be different from the input and output types).

It's not obvious to me what the main problem is that makes it difficult to port Erlang's ideas to Cloud Haskell; is it the problem that we need to specify the type of the messages? In that case a new feature of Cloud Haskell might be useful:

matchAny :: forall b. (AbstractMessage -> Process b) -> Match b

This can be used a process (such as the implementation of the gen_server equivalent) to match against a message of any type; at the moment, the only operation on AbstractMessage is forward, but other operations can easily be added (for instance, one might envision that we could match against an AbstractMessage). Does that help, or do I misunderstand the problem?

hyperthunk commented 12 years ago

It's not obvious to me what the main problem is that makes it difficult to port Erlang's ideas to Cloud Haskell; is it the > problem that we need to specify the type of the messages? In that case a new feature of Cloud Haskell might be useful:

matchAny :: forall b. (AbstractMessage -> Process b) -> Match b

This can be used a process (such as the implementation of the gen_server equivalent) to match against a message > of any type; at the moment, the only operation on AbstractMessage is forward, but other operations can easily be added (for instance, one might envision that we could match against an AbstractMessage). Does that help, or do I misunderstand the problem?

No, that's precisely what we need. At the moment, it's awkward to design an API that processes any message in the abstract, whereas having matchAny would make it simple enough to pull them off the mailbox and shove them into the callback(s). That let's us get arbitrary values out of our inbox.

Now please forgive me for being excruciatingly slow with the use of the type system here - it's not intentional! Given the pseudocode type/record definitions below, what is the return type of handle{Call,Cast} going to be? That is the bit I'm struggling with. Do we just have another type class which allows us to deal with the reply and/or stopping reason, or is there some other really obvious way to do this that I'm missing?


class ServerStop a where
    reason   :: a -> String
    isNormal :: a -> Bool

-- pseudo code, just pointing out the use of ServerStop
data ServerReply (ServerStop b) => a = Reply a | Stop b | Hibernate Int

-- pseudocode
data ServerNoReply (ServerStop a) => a = Ok | Stop a | Hibernate Int

data Server = Server {
    handleCall  :: (AbstractMessage m) => m -> ServerReply ?T
  , handleCast :: (AbstractMessage m) => m -> ServerNoReply ?T
 }

rodlogic commented 12 years ago

I have committed a few files I have been working on this weekend for your reference.

Here is one file that offers nothing new, but is a naive implementation of a simple Erlang server without any abstractions (i.e. using CloudHaskell directly and no reuse). I figure it is better to start bottom up to ground the progress in meaningful ways.

https://github.com/rodlogic/distributed-process-platform/blob/master/src/Control/Distributed/Naive/Kitty.hs

Note that I am ignoring completely some of the more advanced concepts such as timeouts, hibernation or multi-hop req/responses here.

And here is my first attempt at slicing Kitty.hs into a a generic Server.hs using type classes:

https://github.com/rodlogic/distributed-process-platform/blob/master/src/Control/Distributed/Naive/Server.hs

However, the more I think about this, the more I begin to appreciate even more Edsko's suggestion of sticking to a record representation. Type inference issues aside, I am not so sure a type class solution will "scale" to other behaviors we will need to address (e.g. gen_event or gen_fsm).

What about a different approach altogether (at least different to me coming from an imperative world): a functional reactive programming approach to implementing CouldHaskell servers/processes? I.e. a way declaratively and sequentially wire together 1 or more input event streams (ReceivePort's or expect msgs) and 0 or more event sinks (SendPort's or send msgs) in a pure do block. The same abstraction could be used to implement different kinds of processes from a simple one-way or req/resp server to much more complicated state machines. I am not even suggesting we use one of the FRP libraries out there: it could even be a simple design tailored to CloudHaskell.

I will explore this FRP idea a bit more in the next few days, but if you have any feedback about such direction let me know. Note that part of my difficulties here has less to do with the current CloudHaskell API or abstractions but more about Haskell in general as I don't have a whole lot of experience with it compared to the experience I have in the OO world. I am, though, fully aware that my initial instincts tend to take me into a OO route, which is not a good thing here :-)

edsko commented 12 years ago

Now please forgive me for being excruciatingly slow with the use of the type system here - it's not intentional! Given the pseudocode type/record definitions below, what is the return type of handle{Call,Cast} going to be? That is the bit I'm struggling with. Do we just have another type class which allows us to deal with the reply and/or stopping reason, or is there some other really obvious way to do this that I'm missing?

I'm still really unsure what the benefit is of all these type classes that you guys are proposing. A type class is useful when you want some behaviour (in the generic, non-Erlang-technical, sense) to be implicit. "Objects of this type are comparable, and I don't want to have to tell you how to compare them all the time". In this case we are describing a process as a collection of callbacks; that calls for a record, not for a type class. You would need to invent a data type (like you are doing above) simply so that you can define a type class instance for it. What's the point?

So, AbstractMessage isn't a type class either; it's another record, so the type would be

data Server = Server {
  handleCall  :: AbstractMessage -> ServerReply ?T
 }

instead. As regards the ?T bit: I'm not sure. Again, why is ServerReply a type class? What would you normally return from handleCall in Erlang?

edsko commented 12 years ago

As regards FRP: I think that's too ambitious. You are not the first person to consider it -- there is a recent blog post about it -- but I think it would take us too far into the realm of academic research and too far from the realm of actually getting stuff done today.

rodlogic commented 12 years ago

Agreed. That is the same conclusion I came yesterday night after reading a bit on it. That kind of challenge is not really up my alley and a big distraction at this point.

hyperthunk commented 12 years ago

@edsko this is good - now we're getting somewhere. I'm not interested in using type classes for this at all - I just can't figure out how to make the return type of the various handle_* functions general enough to hold anything that can be sent back to the user.

As regards the ?T bit: I'm not sure. Again, why is ServerReply a type class? What would you normally return from handleCall in Erlang?

Well that's exactly the point - you would return any bit of data that you wanted sent back to the client, of any type. You'd also need to indicate whether you want to reply, stop or continue without replying - we'll need a type with the appropriate constructors for that presumably.

The type of the data being sent back can change between invocations of handle_call as well of course, because what the generic server does is something (conceptually in Erlang) like this:


loop(Mod, State) ->
    receive
        {'$gen', call, {From, Msg}} ->
            case Mod:handle_call(Msg, From, State) of
                {stop, Reason} ->
                    terminate(Mod, State, Reason);
                {reply, Reply, State2} ->
                    erlang:send(From, Reply),
                    loop(Mod, State2);
                {noreply, State3} ->
                    loop(Mod, State3)
            end
    end

So the return value of the callback functions is very open - something like:


-type sender :: {pid(), ref()}.
-type state :: any().
-spec handle_call(Msg::any(), From::sender(), State::state()) -> {'reply', any(), state()} | {noreply, state()} | {stop, any()}.

Does this mean that we should have a record that looks something like this?

data Server = Server {
  handleCall :: AbstractMessage -> Typeable  -- or should it be Serialisable?
 }

And also, where would the server state come into this function's type? Presumably we need a type parameter somewhere, which maps to the a in Process a right?

I hope that makes a bit more sense now. We need to go from reading an AbstractMessage to sending something back. The data ServerReply is just a convenience so we can handle the various stop, reply, noreply, hibernate state changes as well as the response data.

edsko commented 12 years ago

Have a look at https://gist.github.com/4025934. Very simplistic, doesn't do any error handling or timeouts or anything at all really, but at least it explains the kind of structure I'd expect.

hyperthunk commented 12 years ago

Thanks @edsko that makes a bit more sense now. It was the use of Serializable that wasn't clear to me, and now I see the record doesn't need to specify too much type information which helps.

edsko commented 12 years ago

Yeah. With a few minor additions to the AbstractMessage interface you could remove more type information; I've thought a bit about this and summarized it at https://github.com/haskell-distributed/distributed-process/issues/30#issuecomment-10119000. But, as I say in that comment, I actually think that the semi-typed version in that gist is better. After all, this is Haskell -- we should strive for more, not fewer, types :)

rodlogic commented 12 years ago

The above code sample gives us a good direction, indeed. A good basis for moving forward with the genserver design.

I have a few additional questions (related/unrelated to genserver/platform):

What is the logging strategy in distributed-haskell (and haskell in general)? Ideally with log levels, etc.
Related to the above question: are there specific strategies you use for debugging/tracing process messages, mailbox queues, etc?
How expensive are typed channels in distributed-haskell? Could they be used to create throw away/ad-hoc reply channels in situations where the client is either not a full-fledged process or doesnt have a permanent relation to the server?

edsko commented 12 years ago

What is the logging strategy in distributed-haskell (and haskell in general)? Ideally with log levels, etc.

There isn't any baked in. Cloud Haskell backends (or indeed distributed-process-platform) add this on top of the basic infrastructure. Good distributed logging is not an easy problem, so we didn't feel that it was appropriate to make it part of the core library (indeed, there may be many alternative approaches).

Related to the above question: are there specific strategies you use for debugging/tracing process messages, mailbox queues, etc?

Hehe. My strategy currently is: modify the core Cloud Haskell library and stick print statements in the right place :-P I would agree that this is non-optimal! :) Please feel free to open an issue about this on the distributed-process wiki, and list some things that you'd like to have.

How expensive are typed channels in distributed-haskell? Could they be used to create throw away/ad-hoc reply channels in situations where the client is either not a full-fledged process or doesnt have a permanent relation to the server?

They are cheap, and are certainly intended to be useful in scenarios such as the one you describe.

hyperthunk commented 12 years ago

@edsko

Yeah. With a few minor additions to the AbstractMessage interface you could remove more type information; I've thought a bit about this and summarized it at haskell-distributed/distributed-process#30. But, as I say in that comment, > I actually think that the semi-typed version in that gist is better. After all, this is Haskell -- we should strive for more, not fewer, types :)

Yes I totally agree, I just couldn't grok how to do the 'halfway typed' thing but I see now and it makes good sense as it is.

Good distributed logging is not an easy problem, so we didn't feel that it was appropriate to make it part of the core library (indeed, there may be many alternative approaches).

OTP doesn't do distributed logging either, but the System Architecture Support Libraries (badly named as SASL has other meanings) do provide a configurable logging subsystem. @rodlogic I'm inclined to just ignore logging for now and revisit it later.

How expensive are typed channels in distributed-haskell? Could they be used to create throw away/ad-hoc reply > > channels in situations where the client is either not a full-fledged process or doesnt have a permanent relation to > > the server?

They are cheap, and are certainly intended to be useful in scenarios such as the one you describe.

Could one of you guys gist an example of how that would work? I think it sounds like a very useful optional model for transient/one-off GenServer clients that aren't inside the Process monad.

hyperthunk commented 12 years ago

Oh and BTW @edsko presumably any exception thrown by a process is propagated as an exit signal to any process that is monitoring it right? I ask because as I'm playing around with your GenServer example the one thing I don't want to do is handle errors explicitly, as that's going to be the supervisor's job! :)

edsko commented 12 years ago

Could one of you guys gist an example of how that would work? I think it sounds like a very useful optional model for transient/one-off GenServer clients that aren't inside the Process monad.

So in my GenServer example I send the server the Process ID of the client to reply. An alternative would be something like

call :: (Serializable a, Serializable b) => Name -> a -> Process b
call name request = do
  (sport, rport) <- newChan
  nsend name (sport, request)
  receiveChan rport

where the client creates a new typed channel, sends the server the send port, and waits on the receive port. This kind of use-once typed channels are very commonplace in process calculi.

rodlogic commented 12 years ago

@edsko Yes, a distributed logging would be nice but it is not an easy problem not to mention that it could be done in many different ways. I think that both distributed-process (dp) and distributed-process-platform (dpt) would benefit from a simple std logging facility even if that means that each OS process performs logging independently to stdout or a file. I will add a ticket for a basic logging facility with a few requirements that seem important.

@hyperthunk Considering that distributed logging is more of a platform feature, what about a ticket here so we can track it and collect comments over time?

edsko commented 12 years ago

Oh and BTW @edsko presumably any exception thrown by a process is propagated as an exit signal to any process that is monitoring it right? I ask because as I'm playing around with your GenServer example the one thing I don't want to do is handle errors explicitly, as that's going to be the supervisor's job! :)

If you are monitoring a process and that process throws an exception you receive a monitor notification, a message like another other, so I'm not sure what you mean by "propagate". If you want that an exception gets thrown in the "monitoring" process you should be using linking instead.

And how does the supervisor kill the children politely (i.e., when we're not opting for brutal_kill as the termination option) - I see the usual suspects in the exception handling API but not throwTo - nor does there appear to be an exit(Pid, reason) corollary.

edsko commented 12 years ago

As I mentioned before, I don't think that logging should be added to the core Cloud Haskell infrastructure. However, a standard distributed-process-logging package, independent of distributed-process-platform might be something to consider.

Jiansen commented 12 years ago

Hi Tim,

Following are decisions made in akka (http://akka.io/ and http://doc.akka.io/api/akka/2.0.4/)

-- how do we define the startup procedure? Using the actorOf method, each ActorSystem (a supervision tree) has a specification, and actors are started inside an ActorSystem according to that specification.

-- what type does a start spec have??? Akka defines a set of elements for Actor System Specification. In Akka, the specification is parsed from a String. Obviously, we could give each element a type and let the type of the specification be a list of the supertype of all element types.

-- how can we cleanly represent error signals in a generic way? Using subclasses of the PossiblyHarmful trait.

As the type system of Haskell is stronger than the type system of Scala, I think you need to do more work than what the akka team did, but the akka design may help your initial design.

Cheers Jiansen

hyperthunk commented 12 years ago

@Jiansen thanks for pointing that out - I am taking some inspiration from akka. You're right about the type system requiring a bit more work but that'll pay for itself later on. :)

hyperthunk commented 12 years ago

So in supervisor, are we actually able to figure out whether the exit succeeded in the expected way or not? Since @rodlogic implemented exit/kill signals in Cloud Haskell (see issue 69) we should be able to do something akin to this:

shutdown(Pid, brutal_kill) ->
    case monitor_child(Pid) of
    ok ->
        exit(Pid, kill),
        receive
        {'DOWN', _MRef, process, Pid, killed} ->
            ok;
        {'DOWN', _MRef, process, Pid, OtherReason} ->
            {error, OtherReason}
        end;
    {error, Reason} ->
        {error, Reason}
    end.

hyperthunk commented 12 years ago

This depends on issue #4 being completed.

hyperthunk commented 12 years ago

I've just finished working on a real-world Erlang/OTP supervisor implementation - we maintain our own version in rabbit with some extra features and I've recently merged that with two major version changes in the base OTP version from which it was forked - and I hope these will provide some inspiration for us. The commit log should provide some useful ideas.

rodlogic commented 12 years ago

I will need a couple of days to incorporate the new exit/catchExit into GenServer as I have work related items to get done first. It should be straightforward, though.

hyperthunk commented 11 years ago

Initial sketch of a supervisor: https://github.com/haskell-distributed/distributed-process-platform/blob/supervisor/src/Control/Distributed/Process/Platform/Supervisor.hs

haskell-distributed / distributed-process-platform

supervision trees #1