특정 프로그램 또는 시스템을 전반적으로 성능분석Events(Cpu-cycles, system call, cache-misses, page-faults, context-switch) 가 언제 / 얼마나 / 어떻게 발생 되었는지를 살펴보기위해
Langauge
Usage
kibeom@kibeom-VirtualBox:~$ perf
usage: perf [--version] [--help] COMMAND [ARGS]
The most commonly used perf commands are:
annotate Read perf.data (created by perf record) and display annotated code
archive Create archive with object files with build-ids found in perf.data file
bench General framework for benchmark suites
buildid-cache Manage build-id cache.
buildid-list List the buildids in a perf.data file
diff Read perf.data files and display the differential profile
evlist List the event names in a perf.data file
inject Filter to augment the events stream with additional information
kmem Tool to trace/measure kernel memory(slab) properties
kvm Tool to trace/measure kvm guest os
list List all symbolic event types
lock Analyze lock events
mem Profile memory accesses
record Run a command and record its profile into perf.data
report Read perf.data (created by perf record) and display the profile
sched Tool to trace/measure scheduler properties (latencies)
script Read perf.data (created by perf record) and display trace output
stat Run a command and gather performance counter statistics
test Runs sanity tests.
timechart Tool to visualize total system behavior during a workload
top System profiling tool.
trace strace inspired tool
probe Define new dynamic tracepoints
See 'perf help COMMAND' for more information on a specific command.
List of available events
List of pre-defined events (to be used in -e):
cpu-clock [Software event]
task-clock [Software event]
page-faults OR faults [Software event]
context-switches OR cs [Software event]
cpu-migrations OR migrations [Software event]
minor-faults [Software event]
major-faults [Software event]
alignment-faults [Software event]
emulation-faults [Software event]
dummy [Software event]
rNNN [Raw hardware event descriptor]
cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]
(see 'man perf-list' on how to encode it)
mem:<addr>[:access] [Hardware breakpoint]
Example
$ cat hello.c
#include <stdio.h>
#include <pthread.h>
#define NUM_THREAD 10
#define NUM_INCREASE 1000000
int cnt_global = 0;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
void* ThreadFunc(void* arg) {
long cnt_local = 0;
pthread_mutex_lock(&mutex);
for (int i = 0; i < NUM_INCREASE; i++) {
cnt_global++;
cnt_local++;
}
pthread_mutex_unlock(&mutex);
return (void*) cnt_local;
}
int main(void) {
pthread_t threads[NUM_THREAD];
//create threads
for (int i = 0; i < NUM_THREAD; i++) {
if (pthread_create(&threads[i], 0, ThreadFunc, NULL) < 0) {
printf("pthread_create error!\n");
return 0;
}
}
//wait threads end
long ret;
for (int i = 0; i < NUM_THREAD; i++) {
pthread_join(threads[i], (void**)&ret);
printf("thread %ld, local count : %ld\n", threads[i], ret);
}
printf("global count : %d\n", cnt_global);
return 0;
}
10개의 스레드를 만들어서 각각 100만씩 global변수에 더해지는 코드이다.
$ sudo perf stat ./hello
thread 139765062092544, local count : 1000000
thread 139765053699840, local count : 1000000
thread 139765045307136, local count : 1000000
thread 139765036914432, local count : 1000000
thread 139765028521728, local count : 1000000
thread 139765020129024, local count : 1000000
thread 139765011736320, local count : 1000000
thread 139765003343616, local count : 1000000
thread 139764994950912, local count : 1000000
thread 139764986558208, local count : 1000000
global count : 10000000
Performance counter stats for './hello':
23.974106 task-clock (msec) # 0.962 CPUs utilized
53 context-switches # 0.002 M/sec
10 cpu-migrations # 0.417 K/sec
80 page-faults # 0.003 M/sec
<not supported> cycles
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
<not supported> instructions
<not supported> branches
<not supported> branch-misses
0.024918035 seconds time elapsed
Linux-Perf
특정 프로그램 또는 시스템을 전반적으로 성능분석 Events(Cpu-cycles, system call, cache-misses, page-faults, context-switch) 가 언제 / 얼마나 / 어떻게 발생 되었는지를 살펴보기위해
Langauge
Usage
List of available events
Example
$ cat hello.c
10개의 스레드를 만들어서 각각 100만씩 global변수에 더해지는 코드이다.
$ sudo perf stat ./hello