llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.99k stars 11.55k forks source link

[OMPT] `codeptr_ra` is pointing to runtime for certain scenarios #92423

Open Thyre opened 4 months ago

Thyre commented 4 months ago

Issue description

The OpenMP Tools Interface includes several callbacks for the host, which include a value called codeptr_ra. In the specifications, it is described like this (for example for ompt_callback_parallel_begin):

The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a runtime routine implements the region associated with a callback that has type signature ompt_callback_parallel_begin_t then codeptr_ra contains the return address of the call to that runtime routine. If the implementation of the region is inlined then codeptr_ra contains the return address of the callback invocation. If attribution to source code is impossible or inappropriate, codeptr_ra may be NULL.

In a lot of cases, this is what LLVM is reporting to the tool. However, I have discovered a few select cases where this fails every single time.

Those can be broken down into the following categories:

I will present one example for each of these down below. The full reproducer can be found at the end of this issue.

Taskloop construct

Taskloop constructs cause the work andtask_create callbacks to return libomp.so:__kmpc_taskloop

One can reproduce it with this example:

#include <unistd.h>

int main( void )
{
    #pragma omp taskloop
    for( int i = 0; i < 5; ++i )
    {
    usleep(10);
    }
}

Result:

$ ./a.out
[ompt_start_tool] tid = -1 | omp_version 201611 | runtime_version = 'LLVM OMP version: 5.0.20140926'
[my_initialize_tool] tid = -1 | initial_device_num 0
[...]
[thread_begin_cb] tid = 1 | type = initial
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial
[sync_region_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | kind = taskgroup | codeptr_ra = ./a.out:(null)
[work_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | work_type = taskloop | count = 5 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660002 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660002
[task_schedule_cb] tid = 1 | prior_task_data = 6660002 | prior_status = complete | next_task_data = 6660001
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660003 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660003
[task_schedule_cb] tid = 1 | prior_task_data = 6660003 | prior_status = complete | next_task_data = 6660001
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660004 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660004
[task_schedule_cb] tid = 1 | prior_task_data = 6660004 | prior_status = complete | next_task_data = 6660001
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660005 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660005
[task_schedule_cb] tid = 1 | prior_task_data = 6660005 | prior_status = complete | next_task_data = 6660001
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660006 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660006
[task_schedule_cb] tid = 1 | prior_task_data = 6660006 | prior_status = complete | next_task_data = 6660001
[work_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | work_type = taskloop | count = 5 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop
[sync_region_wait_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | kind = taskgroup | codeptr_ra = ./a.out:(null)
[sync_region_wait_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | kind = taskgroup | codeptr_ra = ./a.out:(null)
[sync_region_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | kind = taskgroup | codeptr_ra = ./a.out:(null)
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial
[thread_end_cb] tid = 1
[my_finalize_tool] tid = 1

Looking at __kmpc_taskloop, it seems like the call to the tool returns the method called directly before via OMPT_GET_RETURN_ADRESS and not the user one.

Cancelling parallel regions

When a parallel region is cancelled, the pointer for the implicit barrier will point to an internal method and not the user code:

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

#define NUM_ITERATIONS 1000

void run_openmp_cancel_example(void) {

#pragma omp parallel default(none)
    {
        float random_number;
        for (int i = 0; i < NUM_ITERATIONS; ++i) {
#pragma omp cancellation point parallel
            random_number = rand() / RAND_MAX;
            if (random_number < 0.20) {
#pragma omp cancel parallel
            }
        }
    }

    if (omp_get_cancellation()) {
        printf("Parallel got cancelled!\n");
    }
}

int main(void) {
    run_openmp_cancel_example();
    return 0;
}
$ OMP_NUM_THREADS=2 OMP_CANCELLATION=true ./a.out
[thread_begin_cb] tid = 1 | type = initial
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial
[parallel_begin_cb] tid = 1 | parallel_data = 7770001 | encountering_task_data = 6660001 | flags = invoker_runtime_team | requested_parallelism = 2 | codeptr_ra = ./a.out:(null)
[implicit_task_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | actual_parallelism = 2 | index = 0 | flags = implicit
[cancel_cb] tid = 1 | task_data = 6660002 | flags = activated | codeptr_ra = ./a.out:(null)
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[thread_begin_cb] tid = 2 | type = worker
[implicit_task_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | actual_parallelism = 2 | index = 1 | flags = implicit
[cancel_cb] tid = 2 | task_data = 6660003 | flags = detected | codeptr_ra = ./a.out:(null)
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null)
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null)
[sync_region_wait_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null)
[sync_region_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null)
[implicit_task_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | actual_parallelism = 2 | index = 0 | flags = implicit
[parallel_end_cb] tid = 1 | parallel_data = 7770001 | encountering_task_data = 6660001 | flags = invoker_runtime_team | codeptr_ra = ./a.out:(null)
Parallel got cancelled!
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial
[thread_end_cb] tid = 1
[sync_region_wait_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[implicit_task_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660003 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = implicit
[thread_end_cb] tid = 2
[my_finalize_tool] tid = 1

Here, the codeptr_ra points to __kmpc_cancel_barrier which calls the correct barrier.

Target regions

Both target regions and data transfers (ompt_target_emi / ompt_target_data_emi) seem to incorrectly return libomptarget.so for their codeptr_ra

int main( void )
{
    int a[100];
    #pragma omp target data map(to: a[:100])
    {
    #pragma omp target
    {
        a[0] = 1;
    }
    }

    return 0;
}
$ clang -fopenmp --offload-arch=native ./a.out
[thread_begin_cb] tid = 1 | type = initial
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial
[device_initialize_cb] tid = 1
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_device_num_procs" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_device_time" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_translate_time" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_set_trace_ompt" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_set_trace_native" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_start_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_pause_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_flush_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_stop_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_advance_buffer_cursor" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_type" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_ompt" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_native" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_abstract" not provided for sm_75(0)
[device_load_cb] tid = 1
[target_emi_cb] tid = 1 | endpoint = begin | kind = target_enter_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_begin_mapper
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = (nil) | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = 0x7084e4600000 | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = 0x7084e4600000 | src_device_num = 1 | dest_device_num = 0 | optype = transfer_to_device | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = 0x7084e4600000 | src_device_num = 1 | dest_device_num = 0 | optype = transfer_to_device | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_emi_cb] tid = 1 | endpoint = end | kind = target_enter_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_begin_mapper
[target_emi_cb] tid = 1 | endpoint = begin | kind = target | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./a.out:(null)
[target_submit_emi_cb] tid = 1 | endpoint = begin
[target_submit_emi_cb] tid = 1 | endpoint = end
[target_emi_cb] tid = 1 | endpoint = end | kind = target | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./a.out:(null)
[target_emi_cb] tid = 1 | endpoint = begin | kind = target_exit_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_end_mapper
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7084e4600000 | dest_addr = (nil) | src_device_num = 0 | dest_device_num = -1 | optype = delete | bytes = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7084e4600000 | dest_addr = (nil) | src_device_num = 0 | dest_device_num = -1 | optype = delete | bytes = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[target_emi_cb] tid = 1 | endpoint = end | kind = target_exit_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_end_mapper
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial
[thread_end_cb] tid = 1
[my_finalize_tool] tid = 1

Helper threads

When helper threads are active, their codeptr_ra partially seem to point to incorrect positions in the LLVM runtime. I consider this okay in this certain scenario, since the parallel and masked region is generated by the runtime and not by the user. Here, one can question if these callbacks should even by dispatched to the tool.

Here's a code to reproduce the issue:

int main( void )
{
    int a;

    #pragma omp target nowait map(tofrom: a)
    {
        a = 0;
    }

    return a;
}
$ clang -fopenmp --offload-arch=native ./test.c
$ ./a.out
[thread_begin_cb] tid = 1 | type = initial
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial
[device_initialize_cb] tid = 1
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_device_num_procs" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_device_time" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_translate_time" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_set_trace_ompt" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_set_trace_native" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_start_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_pause_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_flush_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_stop_trace" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_advance_buffer_cursor" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_type" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_ompt" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_native" not provided for sm_75(0)
[device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_abstract" not provided for sm_75(0)
[parallel_begin_cb] tid = -1 | parallel_data = 7770001 | WARNING encountering_task_data = 0 | flags = invoker_runtime_team | requested_parallelism = 8 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:(null)
[thread_begin_cb] tid = 2 | type = worker
[thread_begin_cb] tid = 3 | type = worker
[thread_begin_cb] tid = 4 | type = worker
[thread_begin_cb] tid = 5 | type = worker
[thread_begin_cb] tid = 6 | type = worker
[thread_begin_cb] tid = 7 | type = worker
[implicit_task_cb] tid = -1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | actual_parallelism = 8 | index = 0 | flags = implicit
[implicit_task_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | actual_parallelism = 8 | index = 3 | flags = implicit
[implicit_task_cb] tid = 5 | parallel_data = 7770001 | task_data = 6660004 | endpoint = begin | actual_parallelism = 8 | index = 2 | flags = implicit
[implicit_task_cb] tid = 6 | parallel_data = 7770001 | task_data = 6660005 | endpoint = begin | actual_parallelism = 8 | index = 4 | flags = implicit
[implicit_task_cb] tid = 4 | parallel_data = 7770001 | task_data = 6660006 | endpoint = begin | actual_parallelism = 8 | index = 6 | flags = implicit
[implicit_task_cb] tid = 3 | parallel_data = 7770001 | task_data = 6660007 | endpoint = begin | actual_parallelism = 8 | index = 1 | flags = implicit
[thread_begin_cb] tid = 8 | type = worker
[implicit_task_cb] tid = 8 | parallel_data = 7770001 | task_data = 6660008 | endpoint = begin | actual_parallelism = 8 | index = 5 | flags = implicit
[implicit_task_cb] tid = 7 | parallel_data = 7770001 | task_data = 6660009 | endpoint = begin | actual_parallelism = 8 | index = 7 | flags = implicit
[masked_cb] tid = -1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:(null)
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660010 | flags = explicit_untied | has_dependences = 0 | codeptr_ra = ./a.out:(null)
[task_schedule_cb] tid = 2 | prior_task_data = 6660003 | prior_status = switch | next_task_data = 6660010
[sync_region_cb] tid = 7 | parallel_data = 7770001 | task_data = 6660009 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_wait_cb] tid = 7 | parallel_data = 7770001 | task_data = 6660009 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 3 | parallel_data = 7770001 | task_data = 6660007 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_wait_cb] tid = 3 | parallel_data = 7770001 | task_data = 6660007 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 6 | parallel_data = 7770001 | task_data = 6660005 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_wait_cb] tid = 6 | parallel_data = 7770001 | task_data = 6660005 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 5 | parallel_data = 7770001 | task_data = 6660004 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_wait_cb] tid = 5 | parallel_data = 7770001 | task_data = 6660004 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 8 | parallel_data = 7770001 | task_data = 6660008 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_wait_cb] tid = 8 | parallel_data = 7770001 | task_data = 6660008 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_cb] tid = 4 | parallel_data = 7770001 | task_data = 6660006 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[sync_region_wait_cb] tid = 4 | parallel_data = 7770001 | task_data = 6660006 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found!
[device_load_cb] tid = 2
[target_emi_cb] tid = 2 | endpoint = begin | kind = target | device_num = 0 | task_data = 6660010 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./a.out:(null)
[target_data_op_emi_cb] tid = 2 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffcc3c79518 | dest_addr = (nil) | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 4 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[sync_region_cb] tid = -1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:(null)
[target_data_op_emi_cb] tid = 2 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffcc3c79518 | dest_addr = 0x7ee402600000 | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 4 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null)
[sync_region_wait_cb] tid = -1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:(null)

The barriers, parallel and masked callbacks point to the runtime.

Reproducer

To reproduce the issue, I've changed a simple "ompt-printf" tool to include address resolution via dladdr. While this approach is not able to resolve the exact function name each time, it is sufficient to detect the shared library the codeptr_ra is from. I originally encountered the issue on several different systems where the codeptr_ra were resolved using libbfd.

To reproduce the issue:

  1. Build the OpenMP Tools Interface:
    $ clang -fPIC -shared -fopenmp ompt-tool.c -o libompt-tool.so
    $ export OMP_TOOL_LIBRARIES=$(pwd)/libompt-tool.so
  2. Compile each of the applications and run them

I've tested the issue with LLVM 18.1.2 on Ubuntu 22.04 LTS with CUDA 12.4 and an NVIDIA MX550.

llvmbot commented 4 months ago

@llvm/issue-subscribers-openmp

Author: Jan André Reuter (Thyre)

## Issue description The OpenMP Tools Interface includes several callbacks for the host, which include a value called `codeptr_ra`. In the specifications, it is described like this (for example for `ompt_callback_parallel_begin`): > The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a runtime routine implements the region associated with a callback that has type signature ompt_callback_parallel_begin_t then codeptr_ra contains the return address of the call to that runtime routine. If the implementation of the region is inlined then codeptr_ra contains the return address of the callback invocation. If attribution to source code is impossible or inappropriate, codeptr_ra may be NULL. In a lot of cases, this is what LLVM is reporting to the tool. However, I have discovered a few select cases where this fails every single time. Those can be broken down into the following categories: - The `taskloop` construct - Cancelling parallel regions - In some cases, the `target` construct - Helper threads in the LLVM runtime I will present one example for each of these down below. The full reproducer can be found at the end of this issue. ### Taskloop construct Taskloop constructs cause the `work` and`task_create` callbacks to return `libomp.so:__kmpc_taskloop` One can reproduce it with this example: ```c #include <unistd.h> int main( void ) { #pragma omp taskloop for( int i = 0; i < 5; ++i ) { usleep(10); } } ``` Result: ```bash $ ./a.out [ompt_start_tool] tid = -1 | omp_version 201611 | runtime_version = 'LLVM OMP version: 5.0.20140926' [my_initialize_tool] tid = -1 | initial_device_num 0 [...] [thread_begin_cb] tid = 1 | type = initial [implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial [sync_region_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | kind = taskgroup | codeptr_ra = ./a.out:(null) [work_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | work_type = taskloop | count = 5 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop [task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660002 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop [task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660002 [task_schedule_cb] tid = 1 | prior_task_data = 6660002 | prior_status = complete | next_task_data = 6660001 [task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660003 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop [task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660003 [task_schedule_cb] tid = 1 | prior_task_data = 6660003 | prior_status = complete | next_task_data = 6660001 [task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660004 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop [task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660004 [task_schedule_cb] tid = 1 | prior_task_data = 6660004 | prior_status = complete | next_task_data = 6660001 [task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660005 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop [task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660005 [task_schedule_cb] tid = 1 | prior_task_data = 6660005 | prior_status = complete | next_task_data = 6660001 [task_create_cb] tid = 1 | encountering_task_data = 6660001 | new_task_data = 6660006 | flags = explicit_undeferred | has_dependences = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop [task_schedule_cb] tid = 1 | prior_task_data = 6660001 | prior_status = switch | next_task_data = 6660006 [task_schedule_cb] tid = 1 | prior_task_data = 6660006 | prior_status = complete | next_task_data = 6660001 [work_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | work_type = taskloop | count = 5 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_taskloop [sync_region_wait_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | kind = taskgroup | codeptr_ra = ./a.out:(null) [sync_region_wait_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | kind = taskgroup | codeptr_ra = ./a.out:(null) [sync_region_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | kind = taskgroup | codeptr_ra = ./a.out:(null) [implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial [thread_end_cb] tid = 1 [my_finalize_tool] tid = 1 ``` Looking at [`__kmpc_taskloop`](https://github.com/llvm/llvm-project/blob/7a66e4209b0b4cc0dc871a54c4f07a4b0054b5f7/openmp/runtime/src/kmp_tasking.cpp#L5381C6-L5381C21), it seems like the [call to the tool](https://github.com/llvm/llvm-project/blob/7a66e4209b0b4cc0dc871a54c4f07a4b0054b5f7/openmp/runtime/src/kmp_tasking.cpp#L5253) returns the method called directly before via `OMPT_GET_RETURN_ADRESS` and not the user one. ### Cancelling parallel regions When a parallel region is cancelled, the pointer for the implicit barrier will point to an internal method and not the user code: ```c #include <omp.h> #include <stdio.h> #include <stdlib.h> #define NUM_ITERATIONS 1000 void run_openmp_cancel_example(void) { #pragma omp parallel default(none) { float random_number; for (int i = 0; i < NUM_ITERATIONS; ++i) { #pragma omp cancellation point parallel random_number = rand() / RAND_MAX; if (random_number < 0.20) { #pragma omp cancel parallel } } } if (omp_get_cancellation()) { printf("Parallel got cancelled!\n"); } } int main(void) { run_openmp_cancel_example(); return 0; } ``` ```console $ OMP_NUM_THREADS=2 OMP_CANCELLATION=true ./a.out [thread_begin_cb] tid = 1 | type = initial [implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial [parallel_begin_cb] tid = 1 | parallel_data = 7770001 | encountering_task_data = 6660001 | flags = invoker_runtime_team | requested_parallelism = 2 | codeptr_ra = ./a.out:(null) [implicit_task_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | actual_parallelism = 2 | index = 0 | flags = implicit [cancel_cb] tid = 1 | task_data = 6660002 | flags = activated | codeptr_ra = ./a.out:(null) [sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier [sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier [thread_begin_cb] tid = 2 | type = worker [implicit_task_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | actual_parallelism = 2 | index = 1 | flags = implicit [cancel_cb] tid = 2 | task_data = 6660003 | flags = detected | codeptr_ra = ./a.out:(null) [sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier [sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier [sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier [sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier [sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier [sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier [sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier [sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier [sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier [sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier [sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier [sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier [sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier [sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found! [sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomp.so:__kmpc_cancel_barrier [sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found! [sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null) [sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null) [sync_region_wait_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null) [sync_region_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = ./a.out:(null) [implicit_task_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | actual_parallelism = 2 | index = 0 | flags = implicit [parallel_end_cb] tid = 1 | parallel_data = 7770001 | encountering_task_data = 6660001 | flags = invoker_runtime_team | codeptr_ra = ./a.out:(null) Parallel got cancelled! [implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial [thread_end_cb] tid = 1 [sync_region_wait_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found! [sync_region_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660003 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = Address (nil) not found! [implicit_task_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660003 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = implicit [thread_end_cb] tid = 2 [my_finalize_tool] tid = 1 ``` Here, the `codeptr_ra` points to `__kmpc_cancel_barrier` which calls the correct barrier. ## Target regions Both target regions and data transfers (`ompt_target_emi` / `ompt_target_data_emi`) seem to incorrectly return `libomptarget.so` for their `codeptr_ra` ```c int main( void ) { int a[100]; #pragma omp target data map(to: a[:100]) { #pragma omp target { a[0] = 1; } } return 0; } ``` ```console $ clang -fopenmp --offload-arch=native ./a.out [thread_begin_cb] tid = 1 | type = initial [implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial [device_initialize_cb] tid = 1 [device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_device_num_procs" not provided for sm_75(0) [device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_device_time" not provided for sm_75(0) [device_initialize_cb] tid = 1 | tracing interface entry point "ompt_translate_time" not provided for sm_75(0) [device_initialize_cb] tid = 1 | tracing interface entry point "ompt_set_trace_ompt" not provided for sm_75(0) [device_initialize_cb] tid = 1 | tracing interface entry point "ompt_set_trace_native" not provided for sm_75(0) [device_initialize_cb] tid = 1 | tracing interface entry point "ompt_start_trace" not provided for sm_75(0) [device_initialize_cb] tid = 1 | tracing interface entry point "ompt_pause_trace" not provided for sm_75(0) [device_initialize_cb] tid = 1 | tracing interface entry point "ompt_flush_trace" not provided for sm_75(0) [device_initialize_cb] tid = 1 | tracing interface entry point "ompt_stop_trace" not provided for sm_75(0) [device_initialize_cb] tid = 1 | tracing interface entry point "ompt_advance_buffer_cursor" not provided for sm_75(0) [device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_type" not provided for sm_75(0) [device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_ompt" not provided for sm_75(0) [device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_native" not provided for sm_75(0) [device_initialize_cb] tid = 1 | tracing interface entry point "ompt_get_record_abstract" not provided for sm_75(0) [device_load_cb] tid = 1 [target_emi_cb] tid = 1 | endpoint = begin | kind = target_enter_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_begin_mapper [target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = (nil) | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null) [target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = 0x7084e4600000 | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null) [target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = 0x7084e4600000 | src_device_num = 1 | dest_device_num = 0 | optype = transfer_to_device | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null) [target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffe659e6e70 | dest_addr = 0x7084e4600000 | src_device_num = 1 | dest_device_num = 0 | optype = transfer_to_device | bytes = 400 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null) [target_emi_cb] tid = 1 | endpoint = end | kind = target_enter_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_begin_mapper [target_emi_cb] tid = 1 | endpoint = begin | kind = target | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./a.out:(null) [target_submit_emi_cb] tid = 1 | endpoint = begin [target_submit_emi_cb] tid = 1 | endpoint = end [target_emi_cb] tid = 1 | endpoint = end | kind = target | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./a.out:(null) [target_emi_cb] tid = 1 | endpoint = begin | kind = target_exit_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_end_mapper [target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7084e4600000 | dest_addr = (nil) | src_device_num = 0 | dest_device_num = -1 | optype = delete | bytes = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null) [target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7084e4600000 | dest_addr = (nil) | src_device_num = 0 | dest_device_num = -1 | optype = delete | bytes = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:(null) [target_emi_cb] tid = 1 | endpoint = end | kind = target_exit_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = /opt/apps/software/Core/Compilers/LLVM/18.1.2/lib/libomptarget.so.18.1:__tgt_target_data_end_mapper [implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial [thread_end_cb] tid = 1 [my_finalize_tool] tid = 1 ``` ## Helper threads When helper threads are active, their `codeptr_ra` partially seem to point to incorrect positions in the LLVM runtime. I unfortunately don't have a reproducer ready for this one, but will update the issue. ## Reproducer To reproduce the issue, I've changed a simple "ompt-printf" tool to include address resolution via `dladdr`. While this approach is not able to resolve the exact function name each time, it is sufficient to detect the shared library the `codeptr_ra` is from. I originally encountered the issue on several different systems where the `codeptr_ra` were resolved using libbfd. To reproduce the issue: 1. Build the OpenMP Tools Interface: ```console $ clang -fPIC -shared -fopenmp ompt-tool.c -o libompt-tool.so $ export OMP_TOOL_LIBRARIES=$(pwd)/libompt-tool.so ``` 2. Compile each of the applications and run them I've tested the issue with LLVM 18.1.2 on Ubuntu 22.04 LTS with CUDA 12.4 and an NVIDIA MX550. - [reproducer.zip](https://github.com/llvm/llvm-project/files/15337964/reproducer.zip)
dhruvachak commented 3 months ago

I tried the reproducer for "target regions" on both NVPTX and AMDGPU. I am not able to reproduce the problem.

Here are the codeptrs I am seeing:

[target_emi_cb] tid = 1 | endpoint = begin | kind = target_enter_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./test:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffd220ef140 | dest_addr = (nil) | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 400 | codeptr_ra = ./test:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffd220ef140 | dest_addr = 0x7fc3b4600000 | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 400 | codeptr_ra = ./test:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffd220ef140 | dest_addr = 0x7fc3b4600000 | src_device_num = 1 | dest_device_num = 0 | optype = transfer_to_device | bytes = 400 | codeptr_ra = ./test:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffd220ef140 | dest_addr = 0x7fc3b4600000 | src_device_num = 1 | dest_device_num = 0 | optype = transfer_to_device | bytes = 400 | codeptr_ra = ./test:(null)
[target_emi_cb] tid = 1 | endpoint = end | kind = target_enter_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./test:(null)
[target_emi_cb] tid = 1 | endpoint = begin | kind = target | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./test:(null)
[target_emi_cb] tid = 1 | endpoint = end | kind = target | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./test:(null)
[target_emi_cb] tid = 1 | endpoint = begin | kind = target_exit_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./test:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7fc3b4600000 | dest_addr = (nil) | src_device_num = 0 | dest_device_num = -1 | optype = delete | bytes = 0 | codeptr_ra = ./test:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7fc3b4600000 | dest_addr = (nil) | src_device_num = 0 | dest_device_num = -1 | optype = delete | bytes = 0 | codeptr_ra = ./test:(null)
[target_emi_cb] tid = 1 | endpoint = end | kind = target_exit_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./test:(null)
dhruvachak commented 3 months ago

@Thyre I see you tested using LLVM 18.1.2. Some target-related fixes went in after release 18, so that explains why I could not reproduce with top of trunk.

Thyre commented 3 months ago

@Thyre I see you tested using LLVM 18.1.2. Some target-related fixes went in after release 18, so that explains why I could not reproduce with top of trunk.

You're right! Checking with a nightly LLVM build (e949b54a5b7cd7cd0690fa126be3363a21f05a8e), the target example seems to work fine now. That's great! The two host side examples are still broken.