janestreet / magic-trace

magic-trace collects and displays high-resolution traces of what a process is doing
https://magic-trace.org
MIT License
4.56k stars 85 forks source link

Changing c-states breaks magic-trace / IPT #3

Open StephanDollberg opened 2 years ago

StephanDollberg commented 2 years ago

I was experiencing issues where my traces were completely empty around the snapshot. After investigating the perf file it hinted at "instruction trace errors" which led me to

https://perf.wiki.kernel.org/index.php/Perf_tools_support_for_Intel%C2%AE_Processor_Trace

which mentions

It is not uncommon to get overflows when transitioning to a C-state, so these errors are not significant.

I was testing this on a TGL laptop and after disabling turbo boost I got pretty stable traces again.

I am wondering whether other people share the same experience with switching c-states or whether there is maybe something else behind it?

If not it might be worth mentioning disabling c-states in the readme / tutorial? Turbo boost was enough for me but probably something lik e the max_cstate kernel flags work as well.

This is on 5.15.17.

gretay-js commented 2 years ago

I haven't seen this problem, but we will update the tutorial as you suggested, thank you. Magic trace reads events via "perf script --itrace=b". Perhaps we should use "--itrace=be" and do something more user friendly when we see overflow packet events, esp if there aren't any other events.

Xyene commented 2 years ago

Out of curiosity, is opening /dev/cpu_dma_latency and writing 0 to it sufficient to prevent the issues for you, or did you have to disable turbo entirely? I haven't encountered issues with magic-trace on server processors, but I can imagine mobile being different. I'm thinking of a one-liner like

sudo python3 -c "f = open('/dev/cpu_dma_latency', 'w'); f.write('0'); f.flush(); input('Press any key to exit\n')"

before running magic-trace. If that works, I don't think it wouldn't be unreasonable to add this behavior as a flag to magic-trace.

cgaebel commented 2 years ago

33 might also help with this. Assuming this is transient latency, magic-trace would be less likely to fall behind if it has a bigger snapshot buffer.