haskell-distributed / distributed-process

Cloud Haskell core libraries
http://haskell-distributed.github.io
711 stars 96 forks source link

Should we provide deeper (and distributed) tracing facilities? #65

Open hyperthunk opened 12 years ago

hyperthunk commented 12 years ago

Erlang has tracing built in to the runtime system, which is very lightweight and has little runtime performance impact on the traced process. Traces can be set up to match processes (all, [pids...], named/registered, etc) and flags turned on to trace calls to specific modules/functions/etc. Traces are sent to one or more tracer processes, and these typically either throw the trace data straight on to a socket (to reduce impact on the traced system) or print to a file descriptor.

I'm not sure how much of this makes sense for Cloud Haskell, but it would be good to see if we can come up with some corollary mechanism that allows us to trace processes simply and efficiently. I don't think the typical traceEvent style would be useful here, but if the message queue for a process could be transparently used to forward messages to an additional tracer process (or process group) then that would be useful!

edsko commented 12 years ago

Yes, debugging support is definitely something that would be worthwhile to add. It would also be useful to add metrics such as message queue size (possibly which types).

gbaz commented 12 years ago

Erlang uses lamport clocks internally to give an ordering on the traces. As far as I can tell, the trick is that every message contains a (Maybe (Clock,Destination)) and the primitives will manage/pass that along if it exists. It would be relatively simple to extend the process internal state with an optional lamport clock, and expose it directly. There's of course a small overhead even when the clock isn't in use, but I imagine it could be useful in a number of circumstances. My suggestion would be to have the state contain both a Maybe Clock and a [Trace Destinations] so that we decouple the ordering functionality given by the clock which is useful even without a trace from the trace functionality which is a bit useful even without a clock.

edsko commented 12 years ago

I think we should separate out concerns about distributed ordering (like Lamport clocks), which can be implemented on top of the core infrastructure, with hooks into the guts of the system that allow to extract the relevant information. I'm not convinced that the core libraries need to do the former, but obviously they do need to do the latter.

gbaz commented 12 years ago

Sure -- my concern is just that traces are less useful if you don't have some ordering on causality. A trace mechanism that let userland lamport clocks be hooked in (i.e. some customizable action to generate the traces) would indeed probably be cleaner and more elegant.

edsko commented 12 years ago

This causality is also the main difficulty in implementing generic distributed logging. Perhaps that's the core concept that should be implemented (as a separate package, distributed-process-logging perhaps).

hyperthunk commented 12 years ago

Erlang supports both kinds of tracing. The one @gbaz mentioned based on lamport clocks is http://www.erlang.org/doc/man/seq_trace.html, whereas the dynamic process tracing I mentioned is a separate, complimentary feature built into the runtime. A vclock based tracing feature would be nice, but should be a separate package IMO.

For an example of simple tracing facilities, see http://www.erlang.org/doc/man/dbg.html.

hyperthunk commented 11 years ago

It would also be useful to add metrics such as message queue size (possibly which types).

Yes that would be nice. I'm going to split it out into a separate issue however, as it seems distinct from tracing/debugging.

hyperthunk commented 7 years ago

There are some good ideas in this thread, so I'm going to change the title and move it to => Question, viz distributed tracing and hooks into the runtime.