janestreet / magic-trace

magic-trace collects and displays high-resolution traces of what a process is doing
https://magic-trace.org
MIT License
4.52k stars 87 forks source link

Handle `tr end async` trace events #288

Closed Xyene closed 6 months ago

Xyene commented 6 months ago

Kernel commit

https://lore.kernel.org/lkml/20230928072953.19369-1-adrian.hunter@intel.com/

set PERF_IP_FLAG_ASYNC on trace end events. As per

https://github.com/torvalds/linux/blob/7ee022567bf9e2e0b3cd92461a2f4986ecc99673/tools/perf/builtin-script.c#L1546

I think this means that where perf would have previously outputted tr end jmp, it now outputs tr end async.

Adjust our regex to allow async, and treat it as a branch.

Xyene commented 6 months ago

@pikrzysztof, @wbthomason: this should fix https://github.com/janestreet/magic-trace/issues/257#issuecomment-1872989796, could you please give it a try?

You can get a prebuilt binary from the GitHub action run: https://github.com/janestreet/magic-trace/actions/runs/7449992981

pikrzysztof commented 6 months ago

Thanks for a quick contribution! This still crashes for me but with a different error

~/dev/web/org/queues/utilization-latency utilization-latency 17:11:34 RC=1» MAGIC_TRACE_DEBUG=1 ./magic-trace run -z-print-perf-commands ./target/debug/utilization-latency
perf record -o /tmp/magic_trace.tmp.c503dc/perf.data --timestamp --event=intel_pt/cyc=1,cyc_thresh=1,mtc_period=0/u --per-thread -t 120457 --snapshot=e
[ perf record: Woken up 1 times to write data ]
[ Snapshot taken. ]
[ perf record: Captured and wrote 0.128 MB /tmp/magic_trace.tmp.c503dc/perf.data ]
[ Finished recording. ]
[ Decoding, this takes a while... ]
perf script -i /tmp/magic_trace.tmp.c503dc/perf.data --ns --itrace=bep -F pid,tid,time,flags,ip,addr,sym,symoff,synth,dso,event,period --dlfilter /tmp/magic_trace.tmp.c503dc/perf_dlfilter.so
(monitor.ml.Error
 ("BUG: magic-trace devs thought this event was impossible, but you just proved them wrong. Please report this to https://github.com/janestreet/magic-trace/issues/"
  (event
   (Ok
    ((thread ((pid (120457)) (tid (120457)))) (time 7h29m46.678855932s)
     (data
      (Trace (trace_state_change End) (kind Jump) (src 0x562d04f3a2f0)
       (dst 0x0)))))))
 ("Raised at Base__Error.raise in file \"src/error.ml\" (inlined), line 9, characters 14-30"
  "Called from Base__Error.raise_s in file \"src/error.ml\", line 10, characters 19-40"
  "Called from Magic_trace_lib__Trace_writer.write_event in file \"src/trace_writer.ml\", line 1035, characters 10-285"
  "Called from Async_kernel__Pipe.iter_without_pushback.(fun).loop in file \"src/pipe.ml\", line 894, characters 10-13"
  "Called from Async_kernel__Job_queue.run_jobs in file \"src/job_queue.ml\", line 167, characters 6-47"
  "Caught by monitor Monitor.protect"))
Xyene commented 6 months ago

Thanks for testing. I've pushed some changes that might resolve this--unfortunately I don't have an appropriate version of perf to test with, so there might be lingering issues.

https://github.com/janestreet/magic-trace/actions/runs/7453308572/job/20278398604

pikrzysztof commented 6 months ago

Seems to do the job - not quite sure if everything is there as needed but nothing seems to crash hard despite a warning. trace.fxt.gz

~/dev/web/org/queues/utilization-latency utilization-latency 8:28:49 RC=0» MAGIC_TRACE_DEBUG=1 ./magic-trace run -z-print-perf-commands ./target/debug/utilization-latency
perf record -o /tmp/magic_trace.tmp.abda1d/perf.data --timestamp --event=intel_pt/cyc=1,cyc_thresh=1,mtc_period=0/u --per-thread -t 10439 --snapshot=e
[ Snapshot taken. ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.127 MB /tmp/magic_trace.tmp.abda1d/perf.data ]
[ Finished recording. ]
[ Decoding, this takes a while... ]
perf script -i /tmp/magic_trace.tmp.abda1d/perf.data --ns --itrace=bep -F pid,tid,time,flags,ip,addr,sym,symoff,synth,dso,event,period --dlfilter /tmp/magic_trace.tmp.abda1d/perf_dlfilter.so
Warning:
2 instruction trace errors
Visit https://magic-trace.org/ and open trace.fxt to view trace.

I'll probably dig deeper into usefulness of this trace later and report any other issues I find. Thank you!

Xyene commented 6 months ago

Thanks for testing! Your trace has some overflow packets recorded, those map to data being lost due to the system under test running out of memory bandwidth.