intel / PerfSpect

System performance characterization tool based on linux perf
https://intel.github.io/PerfSpect/
BSD 3-Clause "New" or "Revised" License
333 stars 30 forks source link

(WIP) Add cpu mode #31

Closed changzhi1990 closed 8 months ago

changzhi1990 commented 1 year ago

Add new args "--cpu" to collect the perf events of some specific CPU cores. Add two new options "Cpu count" and "Percpu mode" in csv file.

In the microservice scenario, we always bind specific CPU cores to services to achieve better performance. And, we always need to collect the perf events and metrics during the microservices running. So we can benefit from the --cpu argument in the perf stat command and collect these metrics.

changzhi1990 commented 1 year ago

This pr hasn't been finished yet. And there is an error

ubuntu@inspur-icx-1:~/zhi/code/PerfSpect$ sudo python perf-collect.py --cpu 3-4,5-8 --timeout 3
2023-05-16 10:35:14,798 INFO: nmi_watchdog disabled!
2023-05-16 10:35:15,085 INFO: Only CPU/core 3-4,5-8 events will be enabled with cpu option
2023-05-16 10:35:15,204 INFO: PMUs not in use
2023-05-16 10:35:15,204 INFO: Collecting perf stat for events in : /home/ubuntu/zhi/code/PerfSpect/events/icx.txt
2023-05-16 10:35:18,442 INFO: Collection complete! Calculating TSC frequency now
2023-05-16 10:35:19,552 INFO: perf stat dumped to perfstat.csv
ubuntu@inspur-icx-1:~/zhi/code/PerfSpect$ sudo python perf-postprocess.py  --html cpu.html
Traceback (most recent call last):
  File "/home/ubuntu/zhi/code/PerfSpect/perf-postprocess.py", line 906, in <module>
    generate_metrics(
  File "/home/ubuntu/zhi/code/PerfSpect/perf-postprocess.py", line 621, in generate_metrics
    if row["metric"] in event_groups["group_" + str(current_group_indx)]:
KeyError: 'group_34'

We need a deep investigation into that.

changzhi1990 commented 1 year ago

Here is the csv file perfstat.csv

changzhi1990 commented 1 year ago

UPDATED

There is a key error in this pr and I have no idea about it, could someone give me some advice?

ubuntu@inspur-icx-1:~/zhi/code/PerfSpect$ sudo python perf-collect.py --cpu 3 --timeout 3
2023-05-22 15:57:07,545 INFO: nmi_watchdog disabled!
2023-05-22 15:57:07,824 INFO: Only CPU/core 3 events will be enabled with cpu option
2023-05-22 15:57:07,898 INFO: PMUs not in use
2023-05-22 15:57:07,898 INFO: Collecting perf stat for events in : /home/ubuntu/zhi/code/PerfSpect/events/icx.txt
2023-05-22 15:57:11,010 INFO: Collection complete! Calculating TSC frequency now
2023-05-22 15:57:12,096 INFO: perf stat dumped to perfstat.csv
ubuntu@inspur-icx-1:~/zhi/code/PerfSpect$ sudo python perf-postprocess.py  --html cpu.html
Traceback (most recent call last):
  File "/home/ubuntu/zhi/code/PerfSpect/perf-postprocess.py", line 910, in <module>
    generate_metrics(
  File "/home/ubuntu/zhi/code/PerfSpect/perf-postprocess.py", line 625, in generate_metrics
    if row["metric"] in event_groups["group_" + str(current_group_indx)]:
KeyError: 'group_22'

Here is the csv file perfstat.csv

vsoch commented 9 months ago

I'm not familiar with this tool (just taking a look today) but I'd suspect the keys you have for your groups are off - maybe there is a number that is expected to exist that does not? Can you simply just check if it exists before trying to use it?

hilldani commented 9 months ago

@changzhi1990 we added better debugging output for grouping errors. Might be worth updating to latest for future work in this PR

changzhi1990 commented 8 months ago

Hi, all. I will reopen it after some debugging. Thanks