jebtang / likwid

Automatically exported from code.google.com/p/likwid
GNU General Public License v3.0
0 stars 0 forks source link

Validate Phi events #126

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I was surprised to see L2_DATA_READ_MISS_MEM_FILL, L2_DATA_READ_MISS_CACHE_FILL 
as 0 for few programs when measured in PMC1, so i made a comparison with VTUNE 
and feel that there is some problem with likwid-perfcntr?

Later when I ran experiment running only 1 event with PMC0 i got some small 
values.  I dont see any document mentioning that it can be measured only with 
PMC0. And there is a huge difference in L2_DATA_WRITE_MISS_MEM_FILL, 
L2_DATA_READ_MISS_CACHE_FILL compared to VTUNE. Can you help me with this?

What steps will reproduce the problem?
1. I ran an array copying program using both VTUNE and likwid.
2.
3.

What is the expected output? What do you see instead?

                                likwid             Vtune
DATA_READ              200292452    216400000
DATA_WRITE           130125076  380400000
BANK_CONFLICTS            5197  100000
BRANCHES            110225835   123100000
INSTRUCTIONS_EXECUTED   1291182962  1370940000
DATA_READ_OR_WRITE  330451293   609500000
DATA_READ_MISS_OR_WRITE_MISS    42962603        74130000
L2_DATA_READ_MISS_CACHE_FILL    4389              120000
L2_DATA_WRITE_MISS_CACHE_FILL   129949990   129870000
L2_DATA_READ_MISS_MEM_FILL  200038669   199920000
L2_DATA_WRITE_MISS_MEM_FILL 3876           30140000

What version of the product are you using?
likwid-perfctr  3.0 

Please provide any additional information below.
I ran if for add program of stream benchmark with just 1 thread. 

Original issue reported on code.google.com by sur...@gmail.com on 6 Feb 2014 at 4:20

GoogleCodeExporter commented 9 years ago
Hi, the event ID and umask I use for these events are according to the 
documentation. In the documentation it says that FUB is CRI :-). No idea what 
this means, they do not introduce those terms. The only way to say who is right 
is to compare against a microbenchmark where you know the result. I plan to do 
this for Phi also.  I find it suspicious that the vtune results are all flat to 
the fifth digit. Are those end to end measurements?

Original comment by jan.trei...@gmail.com on 12 Feb 2014 at 3:35

GoogleCodeExporter commented 9 years ago
I am running the same program pinned on different core on xeon phi and 
measuring the same event. and the values are different in different 
cores. please checkout the result of multiple runs.

~/perf_anal $ /home/snataraj/perf_anal/likwid/likwid-perfctr -g 
L2_READ_HIT_M:PMC0 -C 58 -O  /home/snataraj/perf_anal/copy
-------------------------------------------------------------
-------------------------------------------------------------
CPU type:       Intel Xeon Phi Coprocessor
CPU clock:      1.05 GHz
-------------------------------------------------------------
/home/snataraj/perf_anal/copy
K=1048577
Status: 0x0

Event,core 58
L2_READ_HIT_M,10761.000000

~/perf_anal $ /home/snataraj/perf_anal/likwid/likwid-perfctr -g 
L2_READ_HIT_M:PMC0 -C 58 -O  /home/snataraj/perf_anal/copy
-------------------------------------------------------------
-------------------------------------------------------------
CPU type:       Intel Xeon Phi Coprocessor
CPU clock:      1.05 GHz
-------------------------------------------------------------
/home/snataraj/perf_anal/copy
K=1048577
Status: 0x0

Event,core 58
L2_READ_HIT_M,11010.000000

~/perf_anal $ ./work_1.sh
~/perf_anal $ /home/snataraj/perf_anal/likwid/likwid-perfctr -g 
L2_READ_HIT_M:PMC0 -C 40 -O  /home/snataraj/perf_anal/copy
-------------------------------------------------------------
-------------------------------------------------------------
CPU type:       Intel Xeon Phi Coprocessor
CPU clock:      1.05 GHz
-------------------------------------------------------------
/home/snataraj/perf_anal/copy
K=1048577
Status: 0x0

Event,core 40
L2_READ_HIT_M,0.000000

~/perf_anal $ /home/snataraj/perf_anal/likwid/likwid-perfctr -g 
L2_READ_HIT_M:PMC0 -C 10 -O  /home/snataraj/perf_anal/copy
-------------------------------------------------------------
-------------------------------------------------------------
CPU type:       Intel Xeon Phi Coprocessor
CPU clock:      1.05 GHz
-------------------------------------------------------------
/home/snataraj/perf_anal/copy
K=1048577
Status: 0x0

Event,core 10
L2_READ_HIT_M,10768.000000

~/perf_anal $ /home/snataraj/perf_anal/likwid/likwid-perfctr -g 
L2_READ_HIT_M:PMC0 -C 40 -O  /home/snataraj/perf_anal/copy
-------------------------------------------------------------
-------------------------------------------------------------
CPU type:       Intel Xeon Phi Coprocessor
CPU clock:      1.05 GHz
-------------------------------------------------------------
/home/snataraj/perf_anal/copy
K=1048577
Status: 0x0

Event,core 40
L2_READ_HIT_M,0.000000

~/perf_anal $ /home/snataraj/perf_anal/likwid/likwid-perfctr -g 
L2_READ_HIT_M:PMC0 -C 54 -O -m /home/snataraj/perf_anal/copy
-------------------------------------------------------------
-------------------------------------------------------------
CPU type:       Intel Xeon Phi Coprocessor
CPU clock:      1.05 GHz
-------------------------------------------------------------
/home/snataraj/perf_anal/copy
K=1048577
Status: 0x0

=====================
Region: Compute
=====================

Region Info,core 54
RDTSC Runtime [s],0.021591
call count,1.000000

Event,core 54
L2_READ_HIT_M,158.000000

~/perf_anal $ /home/snataraj/perf_anal/likwid/likwid-perfctr -g 
L2_READ_HIT_M:PMC0 -C 4 -O  /home/snataraj/perf_anal/copy
-------------------------------------------------------------
-------------------------------------------------------------
CPU type:       Intel Xeon Phi Coprocessor
CPU clock:      1.05 GHz
-------------------------------------------------------------
/home/snataraj/perf_anal/copy
K=1048577
Status: 0x0

Event,core 4
L2_READ_HIT_M,524291.000000

Original comment by sur...@gmail.com on 13 Feb 2014 at 6:10

GoogleCodeExporter commented 9 years ago
Reply from intel forum:

FUB must stand for something like "Functional Unit Block" because P54C 
refers to the processor core (P54C is a specific version of the Pentium 
core, though the actual core in the Xeon Phi has been heavily upgraded 
from the original P54C), CRI refers to the "Cache-Ring-Interface", and 
VPU refers to the "Vector-Processing-Unit". In the Xeon Phi performance 
counters, the UMASK field actually specifies the functional unit for 
which the event is requested, with 0x00 referring to the core (P54C), 
0x10 referring to the CRI, and 0x20 referring to the VPU. This was 
confusing to me at first because because on other processors the UMASK 
is almost always used to modify the specific details of what is measured 
by an Event Select code, rather than actually specifying the Unit on 
which the measurements should occur.   On Xeon Phi the UMASK field is 
used in a way that is much closer to what you would expect a "Unit Mask" 
to mean -- it specifies the "Unit" for which you want the measurements 
to be taken.

Original comment by sur...@gmail.com on 14 Feb 2014 at 10:02