livebook-dev / kino

Client-driven interactive widgets for Livebook
Apache License 2.0
372 stars 65 forks source link

Add `Kino.Process.trace/2` #159

Closed josevalim closed 2 years ago

josevalim commented 2 years ago

Another idea is Kino.Process.trace/2:

Kino.Process.trace(pid, fn ->

end)

The above will trace all messages received by pid (or list of pids) and build a sequence diagram.

josevalim commented 2 years ago

Implementation wise, we should either use :erlang.trace or seq_trace. We need to do some proof of concepts to see what works. A very neat example could be Task.async_stream and see how it spawns a process and sends messages and processes terminates and stuff. :)

@akoutmos, let us know if you want to give it a try!

akoutmos commented 2 years ago

This sounds like something I could do! I can start putting together a PoC and see where I land in the next couple of days :).

akoutmos commented 2 years ago

I put together a really simple PoC using seq_trace and ran the following code during a trace:

1..5
|> Task.async_stream(fn num ->
  100 * num
end,
max_concurrency: 5)
|> Stream.run()

The connections in the sequence diagrams are formatted using the following code as not to overflow the Mermaid diagram:

formatted_message = 
    case message do
      {:EXIT, _, reason} -> "EXIT - #{reason}"
      {:spawn_request, _, _, _, _, _, _, _} -> "spawn_request"
      {:spawn_reply, _, status, _} -> "spawn_reply - #{status}"
      {:down, _, status} -> "down - #{status}"
      {:spawn, _} -> "spawn"
      _ -> ""
    end

Is this sort of what you had in mind for this feature?

image

josevalim commented 2 years ago

Yes, this is pretty much the building block! It is awesome! Some questions:

  1. can we make the timeline start as soon as the line receives the spawn event, or do they always start at the top?
  2. similarly, can we make them stop on exit?
  3. You can definitely hide stream_reply as that is internal, everything else you can inspect but probably trim the characters print?
  4. Do you know what is the last process at the end?
  5. Also let’s show the registered name if available :)
  6. Can you please PR it as Process.seq_trace? :)
akoutmos commented 2 years ago

Thanks for the feedback :). I'll clean up what I have and open up a PR tomorrow. Below are the answers to your questions:

1 & 2. As far as I can tell from the Mermaid docs the labels are always at the top and bottom. I did add actor activation and actor deactivation blocks though so it is clear when a process started and ended (starts with the spawn_request and ends with the EXIT). See the picture below.

  1. We can experiment with this. I tried initially showing part of the message, but the ref strings are very long and make the diagram too small even on my 5k display. Perhaps we show the message (limited number of chars) only when it is not a known message type like spawn_request, exit or stop?
  2. The last process at the end was the last task, but I ran out of vertical screen shot space so the sequence didn't show up. The new screenshot has fewer tasks so you can see the whole diagram :D.
  3. Will do!
  4. Will do!

image

josevalim commented 2 years ago

Beautiful, I loved the bars! Some additional points:

  1. Should we not show the bars if the process already exists or should we show the bar from beginning to end?

  2. Regarding the messages, can we do any sort of mouse over thing? Keep only the label and when you mouse over, you see the whole message? Speaking about labels, we will need to come up with a generic scheme. I would probably do this:

a. For spawn messages, show SPAWN b. For EXIT messages, show "EXIT: normal | shutdown | abnormal" (where is normal is :normal, shutdown either :shutdown, {:shutdown, ...}, and abnormal is everything else c. For all other messages, we should show: the atom value if an atom, the first tuple element if a tuple and the first element is an atom, or just a generic value. However, we need to handle call/cast explicitly. I have some footnote code for this.

  1. There are some dotted arrows after the messages... is this the "receive" event? If so, I think we can ignore it. Another option would be for the arrows to not be horizontal, but point to when the messages are received in the other process timeline, but that may be too complicated?

Messages. call and cast:

iex(1)> pid = spawn fn -> receive do msg -> IO.inspect msg end end
#PID<0.114.0>
iex(2)> GenServer.call(pid, :foo)
{:"$gen_call",
 {#PID<0.110.0>, [:alias | #Reference<0.1727725958.4045209604.214279>]}, :foo}
** (exit) exited in: GenServer.call(#PID<0.114.0>, :foo, 5000)
    ** (EXIT) normal
    (elixir 1.14.0-dev) lib/gen_server.ex:1038: GenServer.call/3
    iex:2: (file)
iex(2)> pid = spawn fn -> receive do msg -> IO.inspect msg end end
#PID<0.119.0>
iex(3)> GenServer.cast(pid, :foo)
{:"$gen_cast", :foo}
:ok

So we probably need code like this:

    defp label_from_message(message) do
      case message do
        {:EXIT, _, reason} -> "EXIT: #{label_from_reason(reason)}"
        {:spawn_request, _, _, _, _, _, _, _} -> "SPAWN"
        {:spawn_reply, _, status, _} -> nil # ignore
        {:"$gen_call", _ref, value} -> "CALL: #{label_from_value(value)}"
        {:"$gen_cast", value} -> "CAST: #{label_from_value(value)}"
        value -> "INFO: #{label_from_value(value)}"
      end
    end

    defp label_from_reason(:normal), do: "normal"
    defp label_from_reason(:shutdown), do: "shutdown"
    defp label_from_reason({:shutdown, _}), do: "shutdown"
    defp label_from_reason(_), do: "abnormal"

    defp label_from_value(atom) when is_atom(atom), do: atom
    defp label_from_value(tuple) when is_tuple(tuple) and is_atom(elem(tuple, 0)), do: elem(tuple, 0)
    defp label_from_value(tuple) when is_tuple(tuple), do: "tuple"
    defp label_from_value(_), do: "term"

We also need to check if we trace :DOWN messages from Process.monitor. If we do:

    {:DOWN, _, :process, _, reason} -> "DOWN: #{label_from_reason(reason)}"

This is very exciting!!!