rlavaee / codestitcher

Interprocedural Basic Block Code Layout Optimization
http://cs.rochester.edu/u/rlavaee/
Other
17 stars 2 forks source link

NaN in ruby results #9

Open mahmoodn opened 4 years ago

mahmoodn commented 4 years ago

I tried to attach the perf command to a process and I allowed it to capture for some seconds

$ sudo ~/codestitcher/build/perf/perf record -e instructions:u --branch-filter any,u -o perf.data -p 32594
^C[ perf record: Woken up 96 times to write data ]
[ perf record: Captured and wrote 25.136 MB perf.data (61045 samples) ]

Now that, when I run the buy command, the stats are NaN.

$ sudo ruby ~/codestitcher/scripts/gen_layout.rb -r nginx -p perf.data -L CS -D 4096
PERF FILES ARE: ["perf.data"]
reading profile: /home/mahmood/nginx-1.14.2/install/sbin/perf.data
reading exe syms for: /home/mahmood/nginx-1.14.2/install/sbin/nginx
reading exe syms completed
reading profile completed
TOTAL WEIGHT: 0
(DYNAMIC) DISTANCE LIMIT: 4096
without affinity, chains: 0
(STATIC INTRA_FUNC) DISTANCE LIMIT: 4096
without affinity, chains: 0
RETURNS:false
ORIGINAL
CALLS: 16 => 0(NaN%) , 64 => 0(NaN%) , 4096 => 0(NaN%) , 262144 => 0(NaN%) , 2097152 => 0(NaN%) ,  => 0(NaN%)
OPTIMIZED
CALLS: 16 => 0(NaN%) , 64 => 0(NaN%) , 4096 => 0(NaN%) , 262144 => 0(NaN%) , 2097152 => 0(NaN%) ,  => 0(NaN%)

Is there any way to debug more?

rlavaee commented 4 years ago

The log says 96 samples have been taken. Maybe none of the samples are mapped to the target binary resulting in divison by zero

Sent from my iPhone

On Apr 11, 2020, at 7:53 AM, Mahmood Naderan notifications@github.com wrote:

 I tried to attach the perf command to a process and I allowed it to capture for some seconds

$ sudo ~/codestitcher/build/perf/perf record -e instructions:u --branch-filter any,u -o perf.data -p 32594 ^C[ perf record: Woken up 96 times to write data ] [ perf record: Captured and wrote 25.136 MB perf.data (61045 samples) ]

Now that, when I run the buy command, the stats are NaN.

$ sudo ruby ~/codestitcher/scripts/gen_layout.rb -r nginx -p perf.data -L CS -D 4096 PERF FILES ARE: ["perf.data"] reading profile: /home/mahmood/nginx-1.14.2/install/sbin/perf.data reading exe syms for: /home/mahmood/nginx-1.14.2/install/sbin/nginx reading exe syms completed reading profile completed TOTAL WEIGHT: 0 (DYNAMIC) DISTANCE LIMIT: 4096 without affinity, chains: 0 (STATIC INTRA_FUNC) DISTANCE LIMIT: 4096 without affinity, chains: 0 RETURNS:false ORIGINAL CALLS: 16 => 0(NaN%) , 64 => 0(NaN%) , 4096 => 0(NaN%) , 262144 => 0(NaN%) , 2097152 => 0(NaN%) , => 0(NaN%) OPTIMIZED CALLS: 16 => 0(NaN%) , 64 => 0(NaN%) , 4096 => 0(NaN%) , 262144 => 0(NaN%) , 2097152 => 0(NaN%) , => 0(NaN%) Is there any way to debug more?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mahmoodn commented 4 years ago

Do you mean the following output where there is no id for the binary file?

$ sudo ~/codestitcher/build/perf/perf buildid-list -i perf.data                              3a2171019937a2070663f3b6419330223bd64e96 [kernel.kallsyms]
b5381a457906d279073822a5ceb24c4bfef94ddb /lib/x86_64-linux-gnu/libc-2.23.so
ce17e023542265fc11d9bc8f534bb4f070493d30 /lib/x86_64-linux-gnu/libpthread-2.23.so
631c69f65c00d3e6d3ee6108202e9c323a21ce28 [vdso]

The command that I used is

$ sudo ~/codestitcher/build/perf/perf record -e instructions:u --branch-filter any,u -o perf.data -p 32594

which attaches to a running process (cpu util 100%). So the command looks fine. I wonder why no information about the binary is recorded.

mahmoodn commented 4 years ago

@rlavaee

According to this line in ruby script, the field brstackcf should be read. When I run

sudo ~/codestitcher/build/perf/perf script --no-demangle --fields brstackcf -i perf.data

nothing is shown. Instead, if I use brstack in the field, I see some outputs. Moreover, according to the manual, there is no field named brstackcf

    -F, --fields <str>    comma separated output fields prepend with 'type:'. +field to add and -field to remove.Valid types: hw,sw,trace,raw,synth. Fields: comm,tid,pid,time,cpu,event,trace,ip,sym,dso,addr,symoff,period,iregs,uregs,brstack,brstacksym,flags,bpf-output,callindent,insn,insnlen,brstackinsn,synth,phys_addr

Did you add something in the build time to include brstachcf in the perf.data?

mahmoodn commented 4 years ago

@rlavaee Can you explain which information the layout generator is searching? I would like to know why it generates zero stats while perf.data contains information about the binary. I have compared some perf commands for both the test code and my binary.

For example:

$ sudo ~/codestitcher/build/perf/perf report -i perf.data
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1M of event 'instructions:u'
# Event count (approx.): 1997630
#
# Overhead  Command  Source Shared Object  Source Symbol                                     Target Symbol
# ........  .......  ....................  ................................................  ....................................
#
    14.51%  nginx    nginx                 [.] ngx_vslprintf                                 [.] ngx_vslprintf
     3.00%  nginx    libc-2.23.so          [.] __memcpy_sse2_unaligned                       [.] __memcpy_sse2_unaligned
     2.93%  nginx    nginx                 [.] ngx_palloc                                    [.] ngx_palloc
     2.42%  nginx    nginx                 [.] ngx_http_log_handler                          [.] ngx_http_log_handler
     2.41%  nginx    nginx                 [.] ngx_http_parse_header_line                    [.] ngx_http_parse_header_line
     2.30%  nginx    libc-2.23.so          [.] __memset_sse2                                 [.] __memset_sse2
     2.06%  nginx    nginx                 [.] ngx_http_parse_request_line                   [.] ngx_http_parse_request_line
     2.00%  nginx    libc-2.23.so          [.] _int_malloc                                   [.] _int_malloc
     1.94%  nginx    libc-2.23.so          [.] _int_free                                     [.] _int_free
     1.54%  nginx    nginx                 [.] ngx_http_handler                              [.] ngx_http_handler
     1.44%  nginx    nginx                 [.] ngx_http_header_filter                        [.] ngx_http_header_filter
     1.42%  nginx    nginx                 [.] ngx_http_write_filter                         [.] ngx_http_write_filter
     1.39%  nginx    nginx                 [.] memcpy@plt                                    [.] __memcpy_sse2_unaligned
...

OR

$ sudo ~/codestitcher/build/perf/perf script --header -i perf.data
# ========
# captured on: Sun Apr 19 14:34:36 2020
# hostname : hpcc-ubuntu
# os release : 4.15.0-96-generic
# perf version : 4.16.7
# arch : x86_64
# nrcpus online : 32
# nrcpus avail : 32
# cpudesc : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
# cpuid : GenuineIntel,6,45,7
# total memory : 49419652 kB
# cmdline : /home/mahmood/codestitcher/build/perf/perf record -e instructions:u --branch-filter any,u -o perf.data -p 10558
# event : name = instructions:u, , size = 112, config = 0x1, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PER
# CPU_TOPOLOGY info available, use -I to display
# NUMA_TOPOLOGY info available, use -I to display
# contains samples with branch stack
# pmu mappings: software = 1, uncore_ha = 7, uncore_imc_3 = 11, uncore_r2pcie = 12, uncore_cbox_6 = 22, uncore_imc_1 = 9, power =
# CACHE info available, use -I to display
# time of first sample : 101194.313677
# time of last sample : 101225.653655
# sample duration :  31339.978 ms
# missing features: TRACING_DATA GROUP_DESC AUXTRACE STAT
# ========
#
           nginx 10558 101194.313677:          1 instructions:u:            42b997 ngx_linux_sendfile_chain (/home/mahmood/nginx-
           nginx 10558 101194.313697:          1 instructions:u:            42b99a ngx_linux_sendfile_chain (/home/mahmood/nginx-
           nginx 10558 101194.313707:          3 instructions:u:            42ba72 ngx_linux_sendfile_chain (/home/mahmood/nginx-
           nginx 10558 101194.313717:         13 instructions:u:            42b80b ngx_linux_sendfile_chain (/home/mahmood/nginx-
           nginx 10558 101194.313728:         50 instructions:u:            408272 ngx_chain_update_sent (/home/mahmood/nginx-1.1
           nginx 10558 101194.313739:        189 instructions:u:            42bae3 ngx_linux_sendfile_chain (/home/mahmood/nginx-
           nginx 10558 101194.313750:        705 instructions:u:            4365cd ngx_http_finalize_request (/home/mahmood/nginx
           nginx 10558 101194.313763:       2522 instructions:u:            43c083 ngx_http_log_variable_getlen (/home/mahmood/ng
           nginx 10558 101194.313793:       8295 instructions:u:            438c29 ngx_http_keepalive_handler (/home/mahmood/ngin
           nginx 10558 101194.313843:      15970 instructions:u:            409a24 ngx_vslprintf (/home/mahmood/nginx-1.14.2/inst
           nginx 10558 101194.313939:      23930 instructions:u:            43fa60 ngx_http_variable_remote_addr (/home/mahmood/n
           nginx 10558 101194.314029:      28760 instructions:u:            40970e ngx_vslprintf (/home/mahmood/nginx-1.14.2/inst
           nginx 10558 101194.314136:      35223 instructions:u:      7fd3eb9e1a28 __memcpy_sse2_unaligned (/lib/x86_64-linux-gnu
           nginx 10558 101194.314274:      41052 instructions:u:            42f34e ngx_http_core_post_access_phase (/home/mahmood

So, I wonder why layout generator returns zero stats.

$ sudo ruby ~/codestitcher/scripts/gen_layout.rb -r nginx -p perf.data -L CS -D 4096
PERF FILES ARE: ["perf.data"]
reading profile: /home/mahmood/nginx-1.14.2/install/sbin/perf.data
reading exe syms for: /home/mahmood/nginx-1.14.2/install/sbin/nginx
reading exe syms completed
reading profile completed
TOTAL WEIGHT: 0
(DYNAMIC) DISTANCE LIMIT: 4096
without affinity, chains: 0
(STATIC INTRA_FUNC) DISTANCE LIMIT: 4096
without affinity, chains: 0
RETURNS:false
ORIGINAL
CALLS: 16 => 0(NaN%) , 64 => 0(NaN%) , 4096 => 0(NaN%) , 262144 => 0(NaN%) , 2097152 => 0(NaN%) ,  => 0(NaN%)
OPTIMIZED
CALLS: 16 => 0(NaN%) , 64 => 0(NaN%) , 4096 => 0(NaN%) , 262144 => 0(NaN%) , 2097152 => 0(NaN%) ,  => 0(NaN%)