kaul84 / likwid

Automatically exported from code.google.com/p/likwid
GNU General Public License v3.0
1 stars 0 forks source link

likwid-perfctr error with non-sequential nodes #134

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Have your NUMA nodes in non-sequential order (in my case, I have nodes 0 and 
2 only).
2. make; make install
3. likwid-perfctr -a

What is the expected output? What do you see instead?
The program should list available performance groups. Instead, it gives:
ERROR - [./src/numa.c:139] No such file or directory
The error happens when function nodeMeminfo is called with node = 1.

What version of the product are you using?
3.1.1

Original issue reported on code.google.com by martin.i...@gmail.com on 28 Feb 2014 at 7:22

GoogleCodeExporter commented 9 years ago

Original comment by jan.trei...@gmail.com on 8 May 2014 at 12:34

GoogleCodeExporter commented 9 years ago
I assume this problem is fixed in 4.0 but I have no machine to test it. 

Original comment by Thomas.R...@googlemail.com on 5 May 2015 at 9:01

GoogleCodeExporter commented 9 years ago
Not quite. I just tested against rev591.

./likwid-perfctr -a
works

likwid-perfctr  -C S1:0  -g FLOPS_DP  ./a.out
ERROR - [./src/numa_hwloc.c:47] No such file or directory

Original comment by martin.i...@gmail.com on 6 May 2015 at 9:40

GoogleCodeExporter commented 9 years ago
likwid-perfctr -a evaluates only text file, no need for topology related stuff.

Since I cannot check it, can you please try the attached patch and tell me the 
results. 

cd trunk
patch -p0 < likwid-hwloc-unsequential-nodes.patch

If it does not work, can you please send me a the output of
ls -la /sys/devices/system/node/node*/meminfo

Original comment by Thomas.R...@googlemail.com on 7 May 2015 at 1:44

Attachments:

GoogleCodeExporter commented 9 years ago
Thank you, Thomas. I think this fixed it. I'm now getting the "counter register 
not supported" error I mentioned in the mailing list. I suppose it'd be better 
to change the discussion to the mailing list: 
https://groups.google.com/d/msg/likwid-users/7H3GPmbiCj4/vQt_wr4RWNAJ

$ ./likwid-perfctr  -c 1  -g FLOPS_DP ./a.out 
--------------------------------------------------------------------------------
CPU name:       AMD Opteron(TM) Processor 6272
CPU type:       AMD Interlagos processor
CPU clock:      2.10 GHz
Counter register PMC0 not supported or PCI device not available
Counter register PMC1 not supported or PCI device not available
Counter register PMC2 not supported or PCI device not available
Counter register PMC3 not supported or PCI device not available
No event in given event string can be configured

Original comment by martin.i...@gmail.com on 7 May 2015 at 2:18

GoogleCodeExporter commented 9 years ago
OK, thanks for testing. I committed the patch to the trunk. Fixed in rev 605.

Original comment by Thomas.R...@googlemail.com on 7 May 2015 at 2:25

GoogleCodeExporter commented 9 years ago
Thomas,

I just realized that this hasn't been fully fixed. Consider the following:

My machine has nodes 0, 2, 4 and 6, each with 16 cores. So node 0 has cores 
0-15, node 2 has cores 16-31 and so on.

I wrote a simple matrix multiplication program to test likwid-perfctr. All data 
is explicitly allocated in node 0 and the program is single-threaded. Below is 
the output for the thread pinned in every node.

Note that the first two programs (cores 0 and 16) have similar results, but the 
last two count nearly no events.

$ likwid-perfctr -C 0 -g NUMA ./a.out                                           

--------------------------------------------------------------------------------
CPU name:       AMD Opteron(TM) Processor 6272
CPU type:       AMD Interlagos processor
CPU clock:      2.10 GHz
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
+----------------------------+---------+-----------+
|            Event           | Counter |   Core 0  | 
+----------------------------+---------+-----------+
| UNC_CPU_TO_DRAM_LOCAL_TO_0 |  UPMC0  | 133301675 | 
| UNC_CPU_TO_DRAM_LOCAL_TO_1 |  UPMC1  |     0     | 
| UNC_CPU_TO_DRAM_LOCAL_TO_2 |  UPMC2  |   132457  | 
| UNC_CPU_TO_DRAM_LOCAL_TO_3 |  UPMC3  |     0     | 
+----------------------------+---------+-----------+

+-------------------------------------------+--------------+
|                   Metric                  |    Core 0    | 
+-------------------------------------------+--------------+
|            Runtime (RDTSC) [s]            | 2.314188e+00 | 
| DRAM read/write local to 0 [MegaEvents/s] | 5.760192e+01 | 
| DRAM read/write local to 1 [MegaEvents/s] |       0      | 
| DRAM read/write local to 2 [MegaEvents/s] | 5.723692e-02 | 
| DRAM read/write local to 3 [MegaEvents/s] |       0      | 
+-------------------------------------------+--------------+

$ likwid-perfctr -C 16 -g NUMA ./a.out                                          

--------------------------------------------------------------------------------
CPU name:       AMD Opteron(TM) Processor 6272
CPU type:       AMD Interlagos processor
CPU clock:      2.10 GHz
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
+----------------------------+---------+-----------+
|            Event           | Counter |  Core 16  | 
+----------------------------+---------+-----------+
| UNC_CPU_TO_DRAM_LOCAL_TO_0 |  UPMC0  |  1635838  | 
| UNC_CPU_TO_DRAM_LOCAL_TO_1 |  UPMC1  |     0     | 
| UNC_CPU_TO_DRAM_LOCAL_TO_2 |  UPMC2  | 131712425 | 
| UNC_CPU_TO_DRAM_LOCAL_TO_3 |  UPMC3  |     0     | 
+----------------------------+---------+-----------+

+-------------------------------------------+--------------+
|                   Metric                  |    Core 16   | 
+-------------------------------------------+--------------+
|            Runtime (RDTSC) [s]            | 2.312083e+00 | 
| DRAM read/write local to 0 [MegaEvents/s] | 7.075171e-01 | 
| DRAM read/write local to 1 [MegaEvents/s] |       0      | 
| DRAM read/write local to 2 [MegaEvents/s] | 5.696701e+01 | 
| DRAM read/write local to 3 [MegaEvents/s] |       0      | 
+-------------------------------------------+--------------+

$ likwid-perfctr -C 32 -g NUMA ./a.out 
--------------------------------------------------------------------------------
CPU name:       AMD Opteron(TM) Processor 6272
CPU type:       AMD Interlagos processor
CPU clock:      2.10 GHz
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
+----------------------------+---------+---------+
|            Event           | Counter | Core 32 | 
+----------------------------+---------+---------+
| UNC_CPU_TO_DRAM_LOCAL_TO_0 |  UPMC0  | 2468693 | 
| UNC_CPU_TO_DRAM_LOCAL_TO_1 |  UPMC1  |    0    | 
| UNC_CPU_TO_DRAM_LOCAL_TO_2 |  UPMC2  |  348677 | 
| UNC_CPU_TO_DRAM_LOCAL_TO_3 |  UPMC3  |    0    | 
+----------------------------+---------+---------+

+-------------------------------------------+--------------+
|                   Metric                  |    Core 32   | 
+-------------------------------------------+--------------+
|            Runtime (RDTSC) [s]            | 2.312734e+00 | 
| DRAM read/write local to 0 [MegaEvents/s] | 1.067435e+00 | 
| DRAM read/write local to 1 [MegaEvents/s] |       0      | 
| DRAM read/write local to 2 [MegaEvents/s] | 1.507640e-01 | 
| DRAM read/write local to 3 [MegaEvents/s] |       0      | 
+-------------------------------------------+--------------+

$ likwid-perfctr -C 48 -g NUMA ./a.out 
--------------------------------------------------------------------------------
CPU name:       AMD Opteron(TM) Processor 6272
CPU type:       AMD Interlagos processor
CPU clock:      2.10 GHz
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
+----------------------------+---------+---------+
|            Event           | Counter | Core 48 | 
+----------------------------+---------+---------+
| UNC_CPU_TO_DRAM_LOCAL_TO_0 |  UPMC0  | 1515669 | 
| UNC_CPU_TO_DRAM_LOCAL_TO_1 |  UPMC1  |    0    | 
| UNC_CPU_TO_DRAM_LOCAL_TO_2 |  UPMC2  |  111452 | 
| UNC_CPU_TO_DRAM_LOCAL_TO_3 |  UPMC3  |    0    | 
+----------------------------+---------+---------+

+-------------------------------------------+--------------+
|                   Metric                  |    Core 48   | 
+-------------------------------------------+--------------+
|            Runtime (RDTSC) [s]            | 2.309161e+00 | 
| DRAM read/write local to 0 [MegaEvents/s] | 6.563722e-01 | 
| DRAM read/write local to 1 [MegaEvents/s] |       0      | 
| DRAM read/write local to 2 [MegaEvents/s] | 4.826515e-02 | 
| DRAM read/write local to 3 [MegaEvents/s] |       0      | 
+-------------------------------------------+--------------+

$ likwid-topology
--------------------------------------------------------------------------------
CPU name:   AMD Opteron(TM) Processor 6272
CPU type:   AMD Interlagos processor
CPU stepping:   2
********************************************************************************
Hardware Thread Topology
********************************************************************************
Sockets:        4
Cores per socket:   8
Threads per core:   2
--------------------------------------------------------------------------------
HWThread    Thread      Core        Socket      Available
0       0       0       0       *
1       0       1       0       *
2       0       2       0       *
3       0       3       0       *
4       0       4       0       *
5       0       5       0       *
6       0       6       0       *
7       0       7       0       *
8       0       0       0       *
9       0       1       0       *
10      0       2       0       *
11      0       3       0       *
12      0       4       0       *
13      0       5       0       *
14      0       6       0       *
15      0       7       0       *
16      0       0       1       *
17      0       1       1       *
18      0       2       1       *
19      0       3       1       *
20      0       4       1       *
21      0       5       1       *
22      0       6       1       *
23      0       7       1       *
24      0       0       1       *
25      0       1       1       *
26      0       2       1       *
27      0       3       1       *
28      0       4       1       *
29      0       5       1       *
30      0       6       1       *
31      0       7       1       *
32      0       0       2       *
33      0       1       2       *
34      0       2       2       *
35      0       3       2       *
36      0       4       2       *
37      0       5       2       *
38      0       6       2       *
39      0       7       2       *
40      0       0       2       *
41      0       1       2       *
42      0       2       2       *
43      0       3       2       *
44      0       4       2       *
45      0       5       2       *
46      0       6       2       *
47      0       7       2       *
48      0       0       3       *
49      0       1       3       *
50      0       2       3       *
51      0       3       3       *
52      0       4       3       *
53      0       5       3       *
54      0       6       3       *
55      0       7       3       *
56      0       0       3       *
57      0       1       3       *
58      0       2       3       *
59      0       3       3       *
60      0       4       3       *
61      0       5       3       *
62      0       6       3       *
63      0       7       3       *
--------------------------------------------------------------------------------
Socket 0:       ( 0 8 1 9 2 10 3 11 4 12 5 13 6 14 7 15 )
Socket 1:       ( 16 24 17 25 18 26 19 27 20 28 21 29 22 30 23 31 )
Socket 2:       ( 32 40 33 41 34 42 35 43 36 44 37 45 38 46 39 47 )
Socket 3:       ( 48 56 49 57 50 58 51 59 52 60 53 61 54 62 55 63 )
--------------------------------------------------------------------------------
********************************************************************************
Cache Topology
********************************************************************************
Level:          1
Size:           16 kB
Cache groups:       ( 0 ) ( 8 ) ( 1 ) ( 9 ) ( 2 ) ( 10 ) ( 3 ) ( 11 ) ( 4 ) ( 12 ) ( 
5 ) ( 13 ) ( 6 ) ( 14 ) ( 7 ) ( 15 ) ( 16 ) ( 24 ) ( 17 ) ( 25 ) ( 18 ) ( 26 ) 
( 19 ) ( 27 ) ( 20 ) ( 28 ) ( 21 ) ( 29 ) ( 22 ) ( 30 ) ( 23 ) ( 31 ) ( 32 ) ( 
40 ) ( 33 ) ( 41 ) ( 34 ) ( 42 ) ( 35 ) ( 43 ) ( 36 ) ( 44 ) ( 37 ) ( 45 ) ( 38 
) ( 46 ) ( 39 ) ( 47 ) ( 48 ) ( 56 ) ( 49 ) ( 57 ) ( 50 ) ( 58 ) ( 51 ) ( 59 ) 
( 52 ) ( 60 ) ( 53 ) ( 61 ) ( 54 ) ( 62 ) ( 55 ) ( 63 )
--------------------------------------------------------------------------------
Level:          2
Size:           2 MB
Cache groups:       ( 0 8 ) ( 1 9 ) ( 2 10 ) ( 3 11 ) ( 4 12 ) ( 5 13 ) ( 6 14 ) ( 7 
15 ) ( 16 24 ) ( 17 25 ) ( 18 26 ) ( 19 27 ) ( 20 28 ) ( 21 29 ) ( 22 30 ) ( 23 
31 ) ( 32 40 ) ( 33 41 ) ( 34 42 ) ( 35 43 ) ( 36 44 ) ( 37 45 ) ( 38 46 ) ( 39 
47 ) ( 48 56 ) ( 49 57 ) ( 50 58 ) ( 51 59 ) ( 52 60 ) ( 53 61 ) ( 54 62 ) ( 55 
63 )
--------------------------------------------------------------------------------
Level:          3
Size:           6 MB
Cache groups:       ( 0 8 1 9 2 10 3 11 ) ( 4 12 5 13 6 14 7 15 ) ( 16 24 17 25 18 
26 19 27 ) ( 20 28 21 29 22 30 23 31 ) ( 32 40 33 41 34 42 35 43 ) ( 36 44 37 
45 38 46 39 47 ) ( 48 56 49 57 50 58 51 59 ) ( 52 60 53 61 54 62 55 63 )
--------------------------------------------------------------------------------
********************************************************************************
NUMA Topology
********************************************************************************
NUMA domains:       4
--------------------------------------------------------------------------------
Domain:         0
Processors:     ( 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 )
Distances:      10 16 16 16
Free memory:        8679.72 MB
Total memory:       16076.8 MB
--------------------------------------------------------------------------------
Domain:         2
Processors:     ( 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 )
Distances:      16 10 16 16
Free memory:        10122.7 MB
Total memory:       16157.9 MB
--------------------------------------------------------------------------------
Domain:         4
Processors:     ( 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 )
Distances:      16 16 10 16
Free memory:        11589.6 MB
Total memory:       16157.9 MB
--------------------------------------------------------------------------------
Domain:         6
Processors:     ( 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 )
Distances:      16 16 16 10
Free memory:        8919.55 MB
Total memory:       16141.9 MB
--------------------------------------------------------------------------------

$ numactl -H
available: 4 nodes (0,2,4,6)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 16076 MB
node 0 free: 8674 MB
node 2 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 2 size: 16157 MB
node 2 free: 10121 MB
node 4 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 4 size: 16157 MB
node 4 free: 11596 MB
node 6 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 6 size: 16141 MB
node 6 free: 8919 MB
node distances:
node   0   2   4   6 
  0:  10  16  16  16 
  2:  16  10  16  16 
  4:  16  16  10  16 
  6:  16  16  16  10 

Original comment by martin.i...@gmail.com on 15 May 2015 at 1:36

GoogleCodeExporter commented 9 years ago
I fixed this by editing groups/interlagos/NUMA.txt to point to the correct 
nodes (in my case).

Original comment by martin.i...@gmail.com on 15 May 2015 at 3:40

GoogleCodeExporter commented 9 years ago
I ran against this error once again. The problem happened when I was using the 
marker API.

$ likwid-perfctr -m -g CACHE -C 0 ./test
--------------------------------------------------------------------------------
CPU name:       AMD Opteron(TM) Processor 6272
CPU type:       AMD Interlagos processor
CPU clock:      2.10 GHz
--------------------------------------------------------------------------------
ERROR - [./src/numa_proc.c:149] No such file or directory
--------------------------------------------------------------------------------
Have you called LIKWID_MARKER_CLOSE?
Cannot find intermediate results file /tmp/likwid_62855.txt

Original comment by martin.i...@gmail.com on 26 May 2015 at 2:54