Open hyperthunk opened 12 years ago
Yes, debugging support is definitely something that would be worthwhile to add. It would also be useful to add metrics such as message queue size (possibly which types).
Erlang uses lamport clocks internally to give an ordering on the traces. As far as I can tell, the trick is that every message contains a (Maybe (Clock,Destination)) and the primitives will manage/pass that along if it exists. It would be relatively simple to extend the process internal state with an optional lamport clock, and expose it directly. There's of course a small overhead even when the clock isn't in use, but I imagine it could be useful in a number of circumstances. My suggestion would be to have the state contain both a Maybe Clock and a [Trace Destinations] so that we decouple the ordering functionality given by the clock which is useful even without a trace from the trace functionality which is a bit useful even without a clock.
I think we should separate out concerns about distributed ordering (like Lamport clocks), which can be implemented on top of the core infrastructure, with hooks into the guts of the system that allow to extract the relevant information. I'm not convinced that the core libraries need to do the former, but obviously they do need to do the latter.
Sure -- my concern is just that traces are less useful if you don't have some ordering on causality. A trace mechanism that let userland lamport clocks be hooked in (i.e. some customizable action to generate the traces) would indeed probably be cleaner and more elegant.
This causality is also the main difficulty in implementing generic distributed logging. Perhaps that's the core concept that should be implemented (as a separate package, distributed-process-logging
perhaps).
Erlang supports both kinds of tracing. The one @gbaz mentioned based on lamport clocks is http://www.erlang.org/doc/man/seq_trace.html, whereas the dynamic process tracing I mentioned is a separate, complimentary feature built into the runtime. A vclock based tracing feature would be nice, but should be a separate package IMO.
For an example of simple tracing facilities, see http://www.erlang.org/doc/man/dbg.html.
It would also be useful to add metrics such as message queue size (possibly which types).
Yes that would be nice. I'm going to split it out into a separate issue however, as it seems distinct from tracing/debugging.
There are some good ideas in this thread, so I'm going to change the title and move it to => Question, viz distributed tracing and hooks into the runtime.
Erlang has tracing built in to the runtime system, which is very lightweight and has little runtime performance impact on the traced process. Traces can be set up to match processes (all, [pids...], named/registered, etc) and flags turned on to trace calls to specific modules/functions/etc. Traces are sent to one or more tracer processes, and these typically either throw the trace data straight on to a socket (to reduce impact on the traced system) or print to a file descriptor.
I'm not sure how much of this makes sense for Cloud Haskell, but it would be good to see if we can come up with some corollary mechanism that allows us to trace processes simply and efficiently. I don't think the typical traceEvent style would be useful here, but if the message queue for a process could be transparently used to forward messages to an additional tracer process (or process group) then that would be useful!