Closed GoogleCodeExporter closed 9 years ago
Original comment by jan.trei...@gmail.com
on 8 May 2014 at 12:34
I assume this problem is fixed in 4.0 but I have no machine to test it.
Original comment by Thomas.R...@googlemail.com
on 5 May 2015 at 9:01
Not quite. I just tested against rev591.
./likwid-perfctr -a
works
likwid-perfctr -C S1:0 -g FLOPS_DP ./a.out
ERROR - [./src/numa_hwloc.c:47] No such file or directory
Original comment by martin.i...@gmail.com
on 6 May 2015 at 9:40
likwid-perfctr -a evaluates only text file, no need for topology related stuff.
Since I cannot check it, can you please try the attached patch and tell me the
results.
cd trunk
patch -p0 < likwid-hwloc-unsequential-nodes.patch
If it does not work, can you please send me a the output of
ls -la /sys/devices/system/node/node*/meminfo
Original comment by Thomas.R...@googlemail.com
on 7 May 2015 at 1:44
Attachments:
Thank you, Thomas. I think this fixed it. I'm now getting the "counter register
not supported" error I mentioned in the mailing list. I suppose it'd be better
to change the discussion to the mailing list:
https://groups.google.com/d/msg/likwid-users/7H3GPmbiCj4/vQt_wr4RWNAJ
$ ./likwid-perfctr -c 1 -g FLOPS_DP ./a.out
--------------------------------------------------------------------------------
CPU name: AMD Opteron(TM) Processor 6272
CPU type: AMD Interlagos processor
CPU clock: 2.10 GHz
Counter register PMC0 not supported or PCI device not available
Counter register PMC1 not supported or PCI device not available
Counter register PMC2 not supported or PCI device not available
Counter register PMC3 not supported or PCI device not available
No event in given event string can be configured
Original comment by martin.i...@gmail.com
on 7 May 2015 at 2:18
OK, thanks for testing. I committed the patch to the trunk. Fixed in rev 605.
Original comment by Thomas.R...@googlemail.com
on 7 May 2015 at 2:25
Thomas,
I just realized that this hasn't been fully fixed. Consider the following:
My machine has nodes 0, 2, 4 and 6, each with 16 cores. So node 0 has cores
0-15, node 2 has cores 16-31 and so on.
I wrote a simple matrix multiplication program to test likwid-perfctr. All data
is explicitly allocated in node 0 and the program is single-threaded. Below is
the output for the thread pinned in every node.
Note that the first two programs (cores 0 and 16) have similar results, but the
last two count nearly no events.
$ likwid-perfctr -C 0 -g NUMA ./a.out
--------------------------------------------------------------------------------
CPU name: AMD Opteron(TM) Processor 6272
CPU type: AMD Interlagos processor
CPU clock: 2.10 GHz
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
+----------------------------+---------+-----------+
| Event | Counter | Core 0 |
+----------------------------+---------+-----------+
| UNC_CPU_TO_DRAM_LOCAL_TO_0 | UPMC0 | 133301675 |
| UNC_CPU_TO_DRAM_LOCAL_TO_1 | UPMC1 | 0 |
| UNC_CPU_TO_DRAM_LOCAL_TO_2 | UPMC2 | 132457 |
| UNC_CPU_TO_DRAM_LOCAL_TO_3 | UPMC3 | 0 |
+----------------------------+---------+-----------+
+-------------------------------------------+--------------+
| Metric | Core 0 |
+-------------------------------------------+--------------+
| Runtime (RDTSC) [s] | 2.314188e+00 |
| DRAM read/write local to 0 [MegaEvents/s] | 5.760192e+01 |
| DRAM read/write local to 1 [MegaEvents/s] | 0 |
| DRAM read/write local to 2 [MegaEvents/s] | 5.723692e-02 |
| DRAM read/write local to 3 [MegaEvents/s] | 0 |
+-------------------------------------------+--------------+
$ likwid-perfctr -C 16 -g NUMA ./a.out
--------------------------------------------------------------------------------
CPU name: AMD Opteron(TM) Processor 6272
CPU type: AMD Interlagos processor
CPU clock: 2.10 GHz
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
+----------------------------+---------+-----------+
| Event | Counter | Core 16 |
+----------------------------+---------+-----------+
| UNC_CPU_TO_DRAM_LOCAL_TO_0 | UPMC0 | 1635838 |
| UNC_CPU_TO_DRAM_LOCAL_TO_1 | UPMC1 | 0 |
| UNC_CPU_TO_DRAM_LOCAL_TO_2 | UPMC2 | 131712425 |
| UNC_CPU_TO_DRAM_LOCAL_TO_3 | UPMC3 | 0 |
+----------------------------+---------+-----------+
+-------------------------------------------+--------------+
| Metric | Core 16 |
+-------------------------------------------+--------------+
| Runtime (RDTSC) [s] | 2.312083e+00 |
| DRAM read/write local to 0 [MegaEvents/s] | 7.075171e-01 |
| DRAM read/write local to 1 [MegaEvents/s] | 0 |
| DRAM read/write local to 2 [MegaEvents/s] | 5.696701e+01 |
| DRAM read/write local to 3 [MegaEvents/s] | 0 |
+-------------------------------------------+--------------+
$ likwid-perfctr -C 32 -g NUMA ./a.out
--------------------------------------------------------------------------------
CPU name: AMD Opteron(TM) Processor 6272
CPU type: AMD Interlagos processor
CPU clock: 2.10 GHz
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
+----------------------------+---------+---------+
| Event | Counter | Core 32 |
+----------------------------+---------+---------+
| UNC_CPU_TO_DRAM_LOCAL_TO_0 | UPMC0 | 2468693 |
| UNC_CPU_TO_DRAM_LOCAL_TO_1 | UPMC1 | 0 |
| UNC_CPU_TO_DRAM_LOCAL_TO_2 | UPMC2 | 348677 |
| UNC_CPU_TO_DRAM_LOCAL_TO_3 | UPMC3 | 0 |
+----------------------------+---------+---------+
+-------------------------------------------+--------------+
| Metric | Core 32 |
+-------------------------------------------+--------------+
| Runtime (RDTSC) [s] | 2.312734e+00 |
| DRAM read/write local to 0 [MegaEvents/s] | 1.067435e+00 |
| DRAM read/write local to 1 [MegaEvents/s] | 0 |
| DRAM read/write local to 2 [MegaEvents/s] | 1.507640e-01 |
| DRAM read/write local to 3 [MegaEvents/s] | 0 |
+-------------------------------------------+--------------+
$ likwid-perfctr -C 48 -g NUMA ./a.out
--------------------------------------------------------------------------------
CPU name: AMD Opteron(TM) Processor 6272
CPU type: AMD Interlagos processor
CPU clock: 2.10 GHz
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
+----------------------------+---------+---------+
| Event | Counter | Core 48 |
+----------------------------+---------+---------+
| UNC_CPU_TO_DRAM_LOCAL_TO_0 | UPMC0 | 1515669 |
| UNC_CPU_TO_DRAM_LOCAL_TO_1 | UPMC1 | 0 |
| UNC_CPU_TO_DRAM_LOCAL_TO_2 | UPMC2 | 111452 |
| UNC_CPU_TO_DRAM_LOCAL_TO_3 | UPMC3 | 0 |
+----------------------------+---------+---------+
+-------------------------------------------+--------------+
| Metric | Core 48 |
+-------------------------------------------+--------------+
| Runtime (RDTSC) [s] | 2.309161e+00 |
| DRAM read/write local to 0 [MegaEvents/s] | 6.563722e-01 |
| DRAM read/write local to 1 [MegaEvents/s] | 0 |
| DRAM read/write local to 2 [MegaEvents/s] | 4.826515e-02 |
| DRAM read/write local to 3 [MegaEvents/s] | 0 |
+-------------------------------------------+--------------+
$ likwid-topology
--------------------------------------------------------------------------------
CPU name: AMD Opteron(TM) Processor 6272
CPU type: AMD Interlagos processor
CPU stepping: 2
********************************************************************************
Hardware Thread Topology
********************************************************************************
Sockets: 4
Cores per socket: 8
Threads per core: 2
--------------------------------------------------------------------------------
HWThread Thread Core Socket Available
0 0 0 0 *
1 0 1 0 *
2 0 2 0 *
3 0 3 0 *
4 0 4 0 *
5 0 5 0 *
6 0 6 0 *
7 0 7 0 *
8 0 0 0 *
9 0 1 0 *
10 0 2 0 *
11 0 3 0 *
12 0 4 0 *
13 0 5 0 *
14 0 6 0 *
15 0 7 0 *
16 0 0 1 *
17 0 1 1 *
18 0 2 1 *
19 0 3 1 *
20 0 4 1 *
21 0 5 1 *
22 0 6 1 *
23 0 7 1 *
24 0 0 1 *
25 0 1 1 *
26 0 2 1 *
27 0 3 1 *
28 0 4 1 *
29 0 5 1 *
30 0 6 1 *
31 0 7 1 *
32 0 0 2 *
33 0 1 2 *
34 0 2 2 *
35 0 3 2 *
36 0 4 2 *
37 0 5 2 *
38 0 6 2 *
39 0 7 2 *
40 0 0 2 *
41 0 1 2 *
42 0 2 2 *
43 0 3 2 *
44 0 4 2 *
45 0 5 2 *
46 0 6 2 *
47 0 7 2 *
48 0 0 3 *
49 0 1 3 *
50 0 2 3 *
51 0 3 3 *
52 0 4 3 *
53 0 5 3 *
54 0 6 3 *
55 0 7 3 *
56 0 0 3 *
57 0 1 3 *
58 0 2 3 *
59 0 3 3 *
60 0 4 3 *
61 0 5 3 *
62 0 6 3 *
63 0 7 3 *
--------------------------------------------------------------------------------
Socket 0: ( 0 8 1 9 2 10 3 11 4 12 5 13 6 14 7 15 )
Socket 1: ( 16 24 17 25 18 26 19 27 20 28 21 29 22 30 23 31 )
Socket 2: ( 32 40 33 41 34 42 35 43 36 44 37 45 38 46 39 47 )
Socket 3: ( 48 56 49 57 50 58 51 59 52 60 53 61 54 62 55 63 )
--------------------------------------------------------------------------------
********************************************************************************
Cache Topology
********************************************************************************
Level: 1
Size: 16 kB
Cache groups: ( 0 ) ( 8 ) ( 1 ) ( 9 ) ( 2 ) ( 10 ) ( 3 ) ( 11 ) ( 4 ) ( 12 ) (
5 ) ( 13 ) ( 6 ) ( 14 ) ( 7 ) ( 15 ) ( 16 ) ( 24 ) ( 17 ) ( 25 ) ( 18 ) ( 26 )
( 19 ) ( 27 ) ( 20 ) ( 28 ) ( 21 ) ( 29 ) ( 22 ) ( 30 ) ( 23 ) ( 31 ) ( 32 ) (
40 ) ( 33 ) ( 41 ) ( 34 ) ( 42 ) ( 35 ) ( 43 ) ( 36 ) ( 44 ) ( 37 ) ( 45 ) ( 38
) ( 46 ) ( 39 ) ( 47 ) ( 48 ) ( 56 ) ( 49 ) ( 57 ) ( 50 ) ( 58 ) ( 51 ) ( 59 )
( 52 ) ( 60 ) ( 53 ) ( 61 ) ( 54 ) ( 62 ) ( 55 ) ( 63 )
--------------------------------------------------------------------------------
Level: 2
Size: 2 MB
Cache groups: ( 0 8 ) ( 1 9 ) ( 2 10 ) ( 3 11 ) ( 4 12 ) ( 5 13 ) ( 6 14 ) ( 7
15 ) ( 16 24 ) ( 17 25 ) ( 18 26 ) ( 19 27 ) ( 20 28 ) ( 21 29 ) ( 22 30 ) ( 23
31 ) ( 32 40 ) ( 33 41 ) ( 34 42 ) ( 35 43 ) ( 36 44 ) ( 37 45 ) ( 38 46 ) ( 39
47 ) ( 48 56 ) ( 49 57 ) ( 50 58 ) ( 51 59 ) ( 52 60 ) ( 53 61 ) ( 54 62 ) ( 55
63 )
--------------------------------------------------------------------------------
Level: 3
Size: 6 MB
Cache groups: ( 0 8 1 9 2 10 3 11 ) ( 4 12 5 13 6 14 7 15 ) ( 16 24 17 25 18
26 19 27 ) ( 20 28 21 29 22 30 23 31 ) ( 32 40 33 41 34 42 35 43 ) ( 36 44 37
45 38 46 39 47 ) ( 48 56 49 57 50 58 51 59 ) ( 52 60 53 61 54 62 55 63 )
--------------------------------------------------------------------------------
********************************************************************************
NUMA Topology
********************************************************************************
NUMA domains: 4
--------------------------------------------------------------------------------
Domain: 0
Processors: ( 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 )
Distances: 10 16 16 16
Free memory: 8679.72 MB
Total memory: 16076.8 MB
--------------------------------------------------------------------------------
Domain: 2
Processors: ( 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 )
Distances: 16 10 16 16
Free memory: 10122.7 MB
Total memory: 16157.9 MB
--------------------------------------------------------------------------------
Domain: 4
Processors: ( 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 )
Distances: 16 16 10 16
Free memory: 11589.6 MB
Total memory: 16157.9 MB
--------------------------------------------------------------------------------
Domain: 6
Processors: ( 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 )
Distances: 16 16 16 10
Free memory: 8919.55 MB
Total memory: 16141.9 MB
--------------------------------------------------------------------------------
$ numactl -H
available: 4 nodes (0,2,4,6)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 16076 MB
node 0 free: 8674 MB
node 2 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 2 size: 16157 MB
node 2 free: 10121 MB
node 4 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 4 size: 16157 MB
node 4 free: 11596 MB
node 6 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 6 size: 16141 MB
node 6 free: 8919 MB
node distances:
node 0 2 4 6
0: 10 16 16 16
2: 16 10 16 16
4: 16 16 10 16
6: 16 16 16 10
Original comment by martin.i...@gmail.com
on 15 May 2015 at 1:36
I fixed this by editing groups/interlagos/NUMA.txt to point to the correct
nodes (in my case).
Original comment by martin.i...@gmail.com
on 15 May 2015 at 3:40
I ran against this error once again. The problem happened when I was using the
marker API.
$ likwid-perfctr -m -g CACHE -C 0 ./test
--------------------------------------------------------------------------------
CPU name: AMD Opteron(TM) Processor 6272
CPU type: AMD Interlagos processor
CPU clock: 2.10 GHz
--------------------------------------------------------------------------------
ERROR - [./src/numa_proc.c:149] No such file or directory
--------------------------------------------------------------------------------
Have you called LIKWID_MARKER_CLOSE?
Cannot find intermediate results file /tmp/likwid_62855.txt
Original comment by martin.i...@gmail.com
on 26 May 2015 at 2:54
Original issue reported on code.google.com by
martin.i...@gmail.com
on 28 Feb 2014 at 7:22