icl-utk-edu / papi

Other
114 stars 50 forks source link

`PAPI_ipc` fails with `Event does not exist` on 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz #126

Open milianw opened 11 months ago

milianw commented 11 months ago

A simple call to PAPI_ipc fails on my Thinkpad P1 Gen4 laptop, whereas this used to work fine on my older laptop and also on my workstation with an AMD Cpu. It is not yet a hybrid intel CPU, and I have elevated my perf privileges (perf stat works just fine).

#include <papi.h>

#include <iostream>

struct Ipc
{
    static Ipc measure()
    {
        Ipc data;
        int ret = PAPI_ipc(&data.realTime, &data.processTime,
                           &data.instructions, &data.ipc);
        if (ret != 0) {
            std::cerr << "IPC measurement failed with code " << ret << ": "
                      << PAPI_strerror(ret) << std::endl;
        }

        return data;
    }

    void print(const char* label) const
    {
        std::cout << label
                  << "\n\trealtime elapsed: " << realTime
                  << ", process time elapsed: " << processTime
                  << "\n\tinstructions executed: " << instructions
                  << ", cycles: " << (instructions / ipc)
                  << ", IPC: " << ipc
                  << "\n";
    }

    float realTime = 0;
    float processTime = 0;
    long long instructions = 0;
    float ipc = 0;
};

int main()
{
    Ipc::measure().print("test");
    return 0;
}

Compiled with:

$ g++ -g -O2 test.cpp -lpapi -o test_papi
$ ./test_papi 
IPC measurement failed with code -7: Event does not exist
test
        realtime elapsed: 0, process time elapsed: 0
        instructions executed: 0, cycles: -nan, IPC: 0

With strace I can see:

perf_event_open({type=PERF_TYPE_HARDWARE, size=0 /* PERF_ATTR_SIZE_??? */, config=PERF_COUNT_HW_INSTRUCTIONS, sample_period=0, sample_type=0, read_format=0, precise_ip=0 /* arbitrary skid */, ...}, 0, -1, -1, 0) = 3
close(3)                                = 0
perf_event_open({type=PERF_TYPE_HARDWARE, size=0 /* PERF_ATTR_SIZE_??? */, config=PERF_COUNT_HW_INSTRUCTIONS, sample_period=0, sample_type=0, read_format=0, precise_ip=0 /* arbitrary skid */, exclude_guest=1, ...}, 0, -1, -1, 0) = 3
close(3)                                = 0

When I instead run strace perf stat -e instructions on some binary I see:

perf_event_open({type=PERF_TYPE_HARDWARE, size=0x88 /* PERF_ATTR_SIZE_??? */, config=PERF_COUNT_HW_INSTRUCTIONS, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 /* arbitrary skid */, exclude_guest=1, ...}, 4762, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3
...
read(3, "\202T \0\0\0\0\0\250\306\6\0\0\0\0\0\250\306\6\0\0\0\0\0", 24) = 24
close(3)                                = 0

My system:

inxi -GSC -xx
System:
  Host: agathemoarbauer Kernel: 6.6.2-arch1-1 arch: x86_64 bits: 64
    compiler: gcc v: 13.2.1 Desktop: KDE Plasma v: 5.27.9 tk: Qt v: 5.15.11
    wm: kwin_x11 dm: SDDM Distro: Arch Linux
CPU:
  Info: 8-core model: 11th Gen Intel Core i7-11850H bits: 64 type: MT MCP
    arch: Tiger Lake rev: 1 cache: L1: 640 KiB L2: 10 MiB L3: 24 MiB
  Speed (MHz): avg: 969 high: 3506 min/max: 800/4800 cores: 1: 800 2: 800
    3: 800 4: 800 5: 800 6: 800 7: 800 8: 3506 9: 800 10: 800 11: 800 12: 800
    13: 800 14: 800 15: 800 16: 800 bogomips: 79888
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
gcongiu commented 11 months ago

Hi Milian,

We recently merged commit b514860b3 that fixed a problem with an attribute size in libpfm4. I am not sure this might be the problem but could you try reverting the following commit to see if it works again for you?

commit b514860b30cfa6edb4379a580c65948058646eab Author: Josh Minor @.***> Date: Wed Sep 13 09:53:49 2023 -0500

Set size of perf_attr_struct prior to getting pfm encoding

Signed-off-by: Josh Minor ***@***.***>

diff --git a/src/components/perf_event/pe_libpfm4_events.c b/src/components/perf_event/pe_libpfm4_events.c index a4f8e5d4b..859f10c8a 100644 --- a/src/components/perf_event/pe_libpfm4_events.c +++ b/src/components/perf_event/pe_libpfm4_events.c @@ -191,6 +191,9 @@ static struct native_event_t *allocate_native_event( perf_arg.attr=&ntv_evt->attr; perf_arg.fstr=&event_string;

Thank you, Giuseppe

On 26 Nov 2023, at 10:51, Milian Wolff @.***> wrote:

A simple call to PAPI_ipc fails on my Thinkpad P1 Gen4 laptop, whereas this used to work fine on my older laptop and also on my workstation with an AMD Cpu. It is not yet a hybrid intel CPU, and I have elevated my perf privileges (perf stat works just fine).

include

include

struct Ipc { static Ipc measure() { Ipc data; int ret = PAPI_ipc(&data.realTime, &data.processTime, &data.instructions, &data.ipc); if (ret != 0) { std::cerr << "IPC measurement failed with code " << ret << ": " << PAPI_strerror(ret) << std::endl; }

    return data;
}

void print(const char* label) const
{
    std::cout << label
              << "\n\trealtime elapsed: " << realTime
              << ", process time elapsed: " << processTime
              << "\n\tinstructions executed: " << instructions
              << ", cycles: " << (instructions / ipc)
              << ", IPC: " << ipc
              << "\n";
}

float realTime = 0;
float processTime = 0;
long long instructions = 0;
float ipc = 0;

};

int main() { Ipc::measure().print("test"); return 0; } Compiled with:

$ g++ -g -O2 test.cpp -lpapi -o test_papi $ ./test_papi IPC measurement failed with code -7: Event does not exist test realtime elapsed: 0, process time elapsed: 0 instructions executed: 0, cycles: -nan, IPC: 0 With strace I can see:

perf_event_open({type=PERF_TYPE_HARDWARE, size=0 / PERF_ATTRSIZE??? /, config=PERF_COUNT_HW_INSTRUCTIONS, sample_period=0, sample_type=0, read_format=0, precise_ip=0 / arbitrary skid /, ...}, 0, -1, -1, 0) = 3 close(3) = 0 perf_event_open({type=PERF_TYPE_HARDWARE, size=0 / PERF_ATTRSIZE??? /, config=PERF_COUNT_HW_INSTRUCTIONS, sample_period=0, sample_type=0, read_format=0, precise_ip=0 / arbitrary skid /, exclude_guest=1, ...}, 0, -1, -1, 0) = 3 close(3) = 0 When I instead run strace perf stat -e instructions on some binary I see:

perf_event_open({type=PERF_TYPE_HARDWARE, size=0x88 / PERF_ATTRSIZE??? /, config=PERF_COUNT_HW_INSTRUCTIONS, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 / arbitrary skid /, exclude_guest=1, ...}, 4762, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3 ... read(3, "\202T \0\0\0\0\0\250\306\6\0\0\0\0\0\250\306\6\0\0\0\0\0", 24) = 24 close(3) = 0 My system:

inxi -GSC -xx System: Host: agathemoarbauer Kernel: 6.6.2-arch1-1 arch: x86_64 bits: 64 compiler: gcc v: 13.2.1 Desktop: KDE Plasma v: 5.27.9 tk: Qt v: 5.15.11 wm: kwin_x11 dm: SDDM Distro: Arch Linux CPU: Info: 8-core model: 11th Gen Intel Core i7-11850H bits: 64 type: MT MCP arch: Tiger Lake rev: 1 cache: L1: 640 KiB L2: 10 MiB L3: 24 MiB Speed (MHz): avg: 969 high: 3506 min/max: 800/4800 cores: 1: 800 2: 800 3: 800 4: 800 5: 800 6: 800 7: 800 8: 3506 9: 800 10: 800 11: 800 12: 800 13: 800 14: 800 15: 800 16: 800 bogomips: 79888 Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx — Reply to this email directly, view it on GitHub https://github.com/icl-utk-edu/papi/issues/126, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGP5MQDL7IO4UKKWBJK6MDYGMGJ7AVCNFSM6AAAAAA72XNTUSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAYTAOJXGQ2TMNY. You are receiving this because you are subscribed to this thread.

gcongiu commented 11 months ago

Also, can you run papi_component_avail, papi_native_avail and papi_avail and attach the output to the response?

Thanks, Giuseppe

On 27 Nov 2023, at 09:31, Giuseppe Congiu @.***> wrote:

Hi Milian,

We recently merged commit b514860b3 that fixed a problem with an attribute size in libpfm4. I am not sure this might be the problem but could you try reverting the following commit to see if it works again for you?

commit b514860b30cfa6edb4379a580c65948058646eab Author: Josh Minor @. @.>> Date: Wed Sep 13 09:53:49 2023 -0500

Set size of perf_attr_struct prior to getting pfm encoding

Signed-off-by: Josh Minor ***@***.*** ***@***.***>>

diff --git a/src/components/perf_event/pe_libpfm4_events.c b/src/components/perf_event/pe_libpfm4_events.c index a4f8e5d4b..859f10c8a 100644 --- a/src/components/perf_event/pe_libpfm4_events.c +++ b/src/components/perf_event/pe_libpfm4_events.c @@ -191,6 +191,9 @@ static struct native_event_t *allocate_native_event( perf_arg.attr=&ntv_evt->attr; perf_arg.fstr=&event_string;

  • // set the size of the perf attr struct before getting pfm encoding
  • ntv_evt->attr.size = sizeof(struct perf_event_attr);
  • /* use user provided name of the event to get the */
    /* perf_event encoding and a fully qualified event string */
    ret = pfm_get_os_event_encoding(name,

Thank you, Giuseppe

On 26 Nov 2023, at 10:51, Milian Wolff @. @.>> wrote:

A simple call to PAPI_ipc fails on my Thinkpad P1 Gen4 laptop, whereas this used to work fine on my older laptop and also on my workstation with an AMD Cpu. It is not yet a hybrid intel CPU, and I have elevated my perf privileges (perf stat works just fine).

include

include

struct Ipc { static Ipc measure() { Ipc data; int ret = PAPI_ipc(&data.realTime, &data.processTime, &data.instructions, &data.ipc); if (ret != 0) { std::cerr << "IPC measurement failed with code " << ret << ": " << PAPI_strerror(ret) << std::endl; }

    return data;
}

void print(const char* label) const
{
    std::cout << label
              << "\n\trealtime elapsed: " << realTime
              << ", process time elapsed: " << processTime
              << "\n\tinstructions executed: " << instructions
              << ", cycles: " << (instructions / ipc)
              << ", IPC: " << ipc
              << "\n";
}

float realTime = 0;
float processTime = 0;
long long instructions = 0;
float ipc = 0;

};

int main() { Ipc::measure().print("test"); return 0; } Compiled with:

$ g++ -g -O2 test.cpp -lpapi -o test_papi $ ./test_papi IPC measurement failed with code -7: Event does not exist test realtime elapsed: 0, process time elapsed: 0 instructions executed: 0, cycles: -nan, IPC: 0 With strace I can see:

perf_event_open({type=PERF_TYPE_HARDWARE, size=0 / PERF_ATTRSIZE??? /, config=PERF_COUNT_HW_INSTRUCTIONS, sample_period=0, sample_type=0, read_format=0, precise_ip=0 / arbitrary skid /, ...}, 0, -1, -1, 0) = 3 close(3) = 0 perf_event_open({type=PERF_TYPE_HARDWARE, size=0 / PERF_ATTRSIZE??? /, config=PERF_COUNT_HW_INSTRUCTIONS, sample_period=0, sample_type=0, read_format=0, precise_ip=0 / arbitrary skid /, exclude_guest=1, ...}, 0, -1, -1, 0) = 3 close(3) = 0 When I instead run strace perf stat -e instructions on some binary I see:

perf_event_open({type=PERF_TYPE_HARDWARE, size=0x88 / PERF_ATTRSIZE??? /, config=PERF_COUNT_HW_INSTRUCTIONS, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, enable_on_exec=1, precise_ip=0 / arbitrary skid /, exclude_guest=1, ...}, 4762, -1, -1, PERF_FLAG_FD_CLOEXEC) = 3 ... read(3, "\202T \0\0\0\0\0\250\306\6\0\0\0\0\0\250\306\6\0\0\0\0\0", 24) = 24 close(3) = 0 My system:

inxi -GSC -xx System: Host: agathemoarbauer Kernel: 6.6.2-arch1-1 arch: x86_64 bits: 64 compiler: gcc v: 13.2.1 Desktop: KDE Plasma v: 5.27.9 tk: Qt v: 5.15.11 wm: kwin_x11 dm: SDDM Distro: Arch Linux CPU: Info: 8-core model: 11th Gen Intel Core i7-11850H bits: 64 type: MT MCP arch: Tiger Lake rev: 1 cache: L1: 640 KiB L2: 10 MiB L3: 24 MiB Speed (MHz): avg: 969 high: 3506 min/max: 800/4800 cores: 1: 800 2: 800 3: 800 4: 800 5: 800 6: 800 7: 800 8: 3506 9: 800 10: 800 11: 800 12: 800 13: 800 14: 800 15: 800 16: 800 bogomips: 79888 Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx — Reply to this email directly, view it on GitHub https://github.com/icl-utk-edu/papi/issues/126, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGP5MQDL7IO4UKKWBJK6MDYGMGJ7AVCNFSM6AAAAAA72XNTUSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAYTAOJXGQ2TMNY. You are receiving this because you are subscribed to this thread.

milianw commented 11 months ago

Reverting the patch doesn't seem to help.

Here's the output before elevating my privileges: papi_avail.txt papi_component_avail.txt papi_native_avail.txt

Here's the output after elevating my privileges: papi_avail.2.txt papi_component_avail.2.txt papi_native_avail.2.txt

milianw commented 11 months ago

For good measure, I'm also attaching the output of perf list which shows a ton of stuff is actually available on my system (and it works when I use e.g. perf stat or perf record). perf.list.txt

gcongiu commented 11 months ago

Your CPU's microarchitecture is Tiger Lake, which is unsupported by PAPI preset events. That is the reason your test is not working as expected. PAPI_ipc uses the PAPI_TOT_INS and PAPI_TOT_CYC preset events.

adanalis commented 11 months ago

Milian, we don't have access to mobile processors to define and test the presets. If you are willing to run a (somewhat lengthy) benchmarking sweep and send us the output, we can add the presets. If you are interested, I can give you instructions on how to run the tests.

thanks, Anthony

On Wed, Nov 29, 2023 at 12:07 PM Milian Wolff @.***> wrote:

For good measure, I'm also attaching the output of perf list which shows a ton of stuff is actually available on my system (and it works when I use e.g. perf stat or perf record). perf.list.txt https://github.com/icl-utk-edu/papi/files/13503487/perf.list.txt

— Reply to this email directly, view it on GitHub https://github.com/icl-utk-edu/papi/issues/126#issuecomment-1832351366, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALK34VMCJDF6AILS6EXI2TYG5TT3AVCNFSM6AAAAAA72XNTUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZSGM2TCMZWGY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

milianw commented 10 months ago

I am willing to contribute back if that helps the project.

But quite frankly I'm pretty surprised by all this - my assumption was that PAPI_ipc uses the same API as perf stat internally, and would thus simply request cycles and instructions - i.e. PERF_COUNT_HW_CPU_CYCLES and PERF_COUNT_HW_INSTRUCTIONS. Why is that not done?

Then, assuming there's a good reason for doing that for the existing covered platforms in PAPI - couldn't you add a generic fallback using these generic counters?

adanalis commented 10 months ago

You could probably add icl under icx in the papi_events.csv and have several of the events work just fine on your laptop. But we can't do this in the official version of papi we distribute without testing first. People rely on papi presets to have gone through some level of testing and verification, we can't just "fallback" to native events we haven't tested. thanks, Anthony

milianw commented 10 months ago

Can you please elaborate on that? For the sake of PAPI_ipc - what testing do you need, or what are you measuring if not the same as perf stat -e cycles,instructions. If that does something wrong, then it would be a kernel bug, no? Can you not rely on the kernel to give you correct data?

adanalis commented 10 months ago

We need to check, at the very least, that the event exists. It sounds reasonable to assume that PERF_COUNT_HW_INSTRUCTIONS would always exist, and thus would not need testing, but then again the kernel also lists events that sometimes do not exist, like L1-DCACHE-PREFETCH-MISS. Since you are willing to help, I will follow up with testing instructions.

thanks, Anthony

On Thu, Nov 30, 2023 at 9:42 AM Milian Wolff @.***> wrote:

Can you please elaborate on that? For the sake of PAPI_ipc - what testing do you need, or what are you measuring if not the same as perf stat -e cycles,instructions. If that does something wrong, then it would be a kernel bug, no? Can you not rely on the kernel to give you correct data?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

milianw commented 10 months ago

I have never come across such situations, short of platforms with broken PMUs (like iMX6). It would be really good if PAPI_ipc and similar "simple" API would work as long as we can find suitable events in e.g. /sys/bus/event_source/devices/cpu/events

thanks