grisp / dgram_logger

Generic datagram logger which sends to various UDP based logging infrastructure
Apache License 2.0
7 stars 1 forks source link

Format supervisor messages nicer, especially crash dumps #7

Open peerst opened 5 years ago

peerst commented 5 years ago

When we not only get progress reports but a crash dump the formatting functions can't deal with the fact that the passed list is not of the format [{K1, V1}, {K2, V2}, ...] but contains a nested list like:

(<8964.1345.0>) call dgram_logger:report_values([{proc_lib,crash},
 [{initial_call,{simple_ctrl,init,['Argument__1']}},
  {pid,<8964.1345.0>},
  {registered_name,simple_ctrl},
  {error_info,
      {error,
          {onewire,nothing_present},
          [{onewire_ds18b20,select_device,1,
               [{file,
                    "/Users/peer/projects/twitch/gsolctrl/_build/default/lib/grisp/src/onewire_ds18b20.erl"},
                {line,45}]},
           {onewire_ds18b20,'-convert/2-fun-0-',2,
               [{file,
                    "/Users/peer/projects/twitch/gsolctrl/_build/default/lib/grisp/src/onewire_ds18b20.erl"},
                {line,32}]},
           {grisp_onewire,handle_call,3,
               [{file,
                    "/Users/peer/projects/twitch/gsolctrl/_build/default/lib/grisp/src/grisp_onewire.erl"},
                {line,133}]},
           {gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,661}]},
           {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,690}]},
           {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}},
  {ancestors,[gsolctrl_sup,<8964.106.0>]},
  {message_queue_len,1},
  {messages,[measure]},
  {links,[<8964.107.0>,<8964.110.0>]},
  {dictionary,[]},
  {trap_exit,false},
  {status,running},
  {heap_size,6772},
  {stack_size,27},
  {reductions,284290}],
 []]) ({dgram_logger,send,4})
eproxus commented 4 years ago

Measurement name is set globally for each handler. If we want to send SASL reports or other OTP metadata (like memory statistics) we might want use different measurement names for different categories of events.

For e.g. a supervisor crash we get this data:

dgram_logger:log(#{level => error,
  meta =>
      #{domain => [otp,sasl],
        error_logger => #{tag => error_report,type => supervisor_report},
        file => "supervisor.erl",gl => <0.223.0>,line => 701,
        logger_formatter => #{title => "SUPERVISOR REPORT"},
        mfa => {supervisor,do_restart,3},
        pid => <0.225.0>,report_cb => fun logger:format_otp_report/1,
        time => 1569603760792635},
  msg =>
      {report,#{label => {supervisor,child_terminated},
                report =>
                    [{supervisor,{local,ssh_sup}},
                     {errorContext,child_terminated},
                     {reason,killed},
                     {offender,[{pid,<0.294.0>},
                                {id,sshc_sup},
                                {mfargs,{sshc_sup,start_link,[]}},
                                {restart_type,permanent},
                                {shutdown,infinity},
                                {child_type,supervisor}]}]}}},#{config =>
      #{host => {127,0,0,1},
        measurement => <<"sol">>,pid => <0.281.0>,port => 8089,
        send_timestamp => true,sock => #Port<0.14>},
  formatter => {logger_formatter,#{}},
  id => foo,module => dgram_logger})

We need to decide if we should override measurement name (or make it configurable per event type), what should be tags and what should be fields.