Open Thyre opened 4 months ago
@llvm/issue-subscribers-openmp
Author: Jan André Reuter (Thyre)
I tried the reproducer for "target regions" on both NVPTX and AMDGPU. I am not able to reproduce the problem.
Here are the codeptrs I am seeing:
[target_emi_cb] tid = 1 | endpoint = begin | kind = target_enter_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./test:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffd220ef140 | dest_addr = (nil) | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 400 | codeptr_ra = ./test:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffd220ef140 | dest_addr = 0x7fc3b4600000 | src_device_num = 1 | dest_device_num = 0 | optype = alloc | bytes = 400 | codeptr_ra = ./test:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffd220ef140 | dest_addr = 0x7fc3b4600000 | src_device_num = 1 | dest_device_num = 0 | optype = transfer_to_device | bytes = 400 | codeptr_ra = ./test:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7ffd220ef140 | dest_addr = 0x7fc3b4600000 | src_device_num = 1 | dest_device_num = 0 | optype = transfer_to_device | bytes = 400 | codeptr_ra = ./test:(null)
[target_emi_cb] tid = 1 | endpoint = end | kind = target_enter_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./test:(null)
[target_emi_cb] tid = 1 | endpoint = begin | kind = target | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./test:(null)
[target_emi_cb] tid = 1 | endpoint = end | kind = target | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./test:(null)
[target_emi_cb] tid = 1 | endpoint = begin | kind = target_exit_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./test:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = begin | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7fc3b4600000 | dest_addr = (nil) | src_device_num = 0 | dest_device_num = -1 | optype = delete | bytes = 0 | codeptr_ra = ./test:(null)
[target_data_op_emi_cb] tid = 1 | endpoint = end | target_task_data = 0 | target_data = 0 | host_op_id = 0 | src_addr = 0x7fc3b4600000 | dest_addr = (nil) | src_device_num = 0 | dest_device_num = -1 | optype = delete | bytes = 0 | codeptr_ra = ./test:(null)
[target_emi_cb] tid = 1 | endpoint = end | kind = target_exit_data | device_num = 0 | task_data = 6660001 | target_task_data = 0 | target_data = 0 | codeptr_ra = ./test:(null)
@Thyre I see you tested using LLVM 18.1.2. Some target-related fixes went in after release 18, so that explains why I could not reproduce with top of trunk.
@Thyre I see you tested using LLVM 18.1.2. Some target-related fixes went in after release 18, so that explains why I could not reproduce with top of trunk.
You're right! Checking with a nightly LLVM build (e949b54a5b7cd7cd0690fa126be3363a21f05a8e), the target example seems to work fine now. That's great! The two host side examples are still broken.
Issue description
The OpenMP Tools Interface includes several callbacks for the host, which include a value called
codeptr_ra
. In the specifications, it is described like this (for example forompt_callback_parallel_begin
):In a lot of cases, this is what LLVM is reporting to the tool. However, I have discovered a few select cases where this fails every single time.
Those can be broken down into the following categories:
taskloop
constructtarget
constructI will present one example for each of these down below. The full reproducer can be found at the end of this issue.
Taskloop construct
Taskloop constructs cause the
work
andtask_create
callbacks to returnlibomp.so:__kmpc_taskloop
One can reproduce it with this example:
Result:
Looking at
__kmpc_taskloop
, it seems like the call to the tool returns the method called directly before viaOMPT_GET_RETURN_ADRESS
and not the user one.Cancelling parallel regions
When a parallel region is cancelled, the pointer for the implicit barrier will point to an internal method and not the user code:
Here, the
codeptr_ra
points to__kmpc_cancel_barrier
which calls the correct barrier.Target regions
Both target regions and data transfers (
ompt_target_emi
/ompt_target_data_emi
) seem to incorrectly returnlibomptarget.so
for theircodeptr_ra
Helper threads
When helper threads are active, their
codeptr_ra
partially seem to point to incorrect positions in the LLVM runtime. I consider this okay in this certain scenario, since the parallel and masked region is generated by the runtime and not by the user. Here, one can question if these callbacks should even by dispatched to the tool.Here's a code to reproduce the issue:
The barriers, parallel and masked callbacks point to the runtime.
Reproducer
To reproduce the issue, I've changed a simple "ompt-printf" tool to include address resolution via
dladdr
. While this approach is not able to resolve the exact function name each time, it is sufficient to detect the shared library thecodeptr_ra
is from. I originally encountered the issue on several different systems where thecodeptr_ra
were resolved using libbfd.To reproduce the issue:
I've tested the issue with LLVM 18.1.2 on Ubuntu 22.04 LTS with CUDA 12.4 and an NVIDIA MX550.