kosslab-kr / linux-perf

:rocket: perf contribution (mirrored from git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git)
Other
16 stars 8 forks source link

[권기범] 프로젝트 소개문서 #179

Open bumkeyy opened 7 years ago

bumkeyy commented 7 years ago

Linux-Perf

특정 프로그램 또는 시스템을 전반적으로 성능분석 Events(Cpu-cycles, system call, cache-misses, page-faults, context-switch)언제 / 얼마나 / 어떻게 발생 되었는지를 살펴보기위해

Langauge

Usage

kibeom@kibeom-VirtualBox:~$ perf

 usage: perf [--version] [--help] COMMAND [ARGS]

 The most commonly used perf commands are:
   annotate        Read perf.data (created by perf record) and display annotated code
   archive         Create archive with object files with build-ids found in perf.data file
   bench           General framework for benchmark suites
   buildid-cache   Manage build-id cache.
   buildid-list    List the buildids in a perf.data file
   diff            Read perf.data files and display the differential profile
   evlist          List the event names in a perf.data file
   inject          Filter to augment the events stream with additional information
   kmem            Tool to trace/measure kernel memory(slab) properties
   kvm             Tool to trace/measure kvm guest os
   list            List all symbolic event types
   lock            Analyze lock events
   mem             Profile memory accesses
   record          Run a command and record its profile into perf.data
   report          Read perf.data (created by perf record) and display the profile
   sched           Tool to trace/measure scheduler properties (latencies)
   script          Read perf.data (created by perf record) and display trace output
   stat            Run a command and gather performance counter statistics
   test            Runs sanity tests.
   timechart       Tool to visualize total system behavior during a workload
   top             System profiling tool.
   trace           strace inspired tool
   probe           Define new dynamic tracepoints

 See 'perf help COMMAND' for more information on a specific command.

List of available events

List of pre-defined events (to be used in -e):
  cpu-clock                                          [Software event]
  task-clock                                         [Software event]
  page-faults OR faults                              [Software event]
  context-switches OR cs                             [Software event]
  cpu-migrations OR migrations                       [Software event]
  minor-faults                                       [Software event]
  major-faults                                       [Software event]
  alignment-faults                                   [Software event]
  emulation-faults                                   [Software event]
  dummy                                              [Software event]

  rNNN                                               [Raw hardware event descriptor]
  cpu/t1=v1[,t2=v2,t3 ...]/modifier                  [Raw hardware event descriptor]
   (see 'man perf-list' on how to encode it)

  mem:<addr>[:access]                                [Hardware breakpoint]

Example

$ cat hello.c

#include <stdio.h>
#include <pthread.h>

#define NUM_THREAD  10
#define NUM_INCREASE    1000000

int cnt_global = 0;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

void* ThreadFunc(void* arg) {
    long cnt_local = 0;

    pthread_mutex_lock(&mutex);
    for (int i = 0; i < NUM_INCREASE; i++) {
        cnt_global++;
        cnt_local++;
    }

    pthread_mutex_unlock(&mutex);
    return (void*) cnt_local;
}

int main(void) {
    pthread_t threads[NUM_THREAD];

    //create threads
    for (int i = 0; i < NUM_THREAD; i++) {
        if (pthread_create(&threads[i], 0, ThreadFunc, NULL) < 0) {
            printf("pthread_create error!\n");
            return 0;
        }
    }

    //wait threads end
    long ret;
    for (int i = 0; i < NUM_THREAD; i++) {
        pthread_join(threads[i], (void**)&ret);
        printf("thread %ld, local count : %ld\n", threads[i], ret);
    }
    printf("global count : %d\n", cnt_global);

    return 0;
}

10개의 스레드를 만들어서 각각 100만씩 global변수에 더해지는 코드이다.

$ sudo perf stat ./hello

thread 139765062092544, local count : 1000000
thread 139765053699840, local count : 1000000
thread 139765045307136, local count : 1000000
thread 139765036914432, local count : 1000000
thread 139765028521728, local count : 1000000
thread 139765020129024, local count : 1000000
thread 139765011736320, local count : 1000000
thread 139765003343616, local count : 1000000
thread 139764994950912, local count : 1000000
thread 139764986558208, local count : 1000000
global count : 10000000

 Performance counter stats for './hello':

         23.974106      task-clock (msec)         #    0.962 CPUs utilized
                53      context-switches          #    0.002 M/sec
                10      cpu-migrations            #    0.417 K/sec
                80      page-faults               #    0.003 M/sec
   <not supported>      cycles
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
   <not supported>      instructions
   <not supported>      branches
   <not supported>      branch-misses

       0.024918035 seconds time elapsed