Closed Thyre closed 10 months ago
@llvm/issue-subscribers-openmp
Hey and thanks for bringing this to our attention!
So, from my perspective this issue is only affecting the reported target pointer for the DataOp alloc
EMI callback (i.e.: optype=1
) -- please confirm.
I don't completely follow where the NVHPC is surpassing the amount of provided information in that case.
Since for optype=1 endpoint=1
it will report dest=(nil)
, like the others.
Could you please point out the difference?
Other than that I'm able to reproduce and also fix the issue.
Callback DataOp EMI: endpoint=1 optype=1 target_task_data=0x175a658 (0x0) target_data=0x7f0e0ff4c368 (0x8000000000000001) host_op_id=0x7f0e0ff4c380 (0x8000000000000002) src=0x7fffca5c9b90 src_device_num=8 dest=(nil) dest_device_num=0 bytes=400000 code=0x7f0e11418bd1
Callback DataOp EMI: endpoint=2 optype=1 target_task_data=0x175a658 (0x0) target_data=0x7f0e0ff4c368 (0x8000000000000001) host_op_id=0x7f0e0ff4c380 (0x8000000000000002) src=0x7fffca5c9b90 src_device_num=8 dest=0x7f0e08a00000 dest_device_num=0 bytes=400000 code=0x7f0e11418bd1
Callback DataOp EMI: endpoint=1 optype=1 target_task_data=0x175a658 (0x0) target_data=0x7f0e0ff4c368 (0x8000000000000001) host_op_id=0x7f0e0ff4c380 (0x8000000000000004) src=0x7fffca568110 src_device_num=8 dest=(nil) dest_device_num=0 bytes=400000 code=0x7f0e11418bd1
Callback DataOp EMI: endpoint=2 optype=1 target_task_data=0x175a658 (0x0) target_data=0x7f0e0ff4c368 (0x8000000000000001) host_op_id=0x7f0e0ff4c380 (0x8000000000000004) src=0x7fffca568110 src_device_num=8 dest=0x7f0e08a62000 dest_device_num=0 bytes=400000 code=0x7f0e11418bd1
So, this will only have an effect on DataOp EMI callbacks where endpoint=2
and optype=1
(ompt_target_data_alloc
).
Is this an acceptable solution / the anticipated output?
(I'll open a Phabricator review once we're ready here.)
Thanks for taking the time esp. to provide an elaborate issue description -- much appreciated!
So, from my perspective this issue is only affecting the reported target pointer for the DataOp alloc EMI callback (i.e.:
optype=1
) -- please confirm.
Yes, you're right. This only affects optype = 1
. All other cases seem to be fine, as far as I have seen / tested.
I don't completely follow where the NVHPC is surpassing the amount of provided information in that case. Since for
optype=1 endpoint=1
it will reportdest=(nil)
, like the others. Could you please point out the difference?
To be honest, I misread the output of NVHPC. There's no difference. Sorry for the confusion.
Callback DataOp EMI: endpoint=1 optype=1 target_task_data=0x175a658 (0x0) target_data=0x7f0e0ff4c368 (0x8000000000000001) host_op_id=0x7f0e0ff4c380 (0x8000000000000002) src=0x7fffca5c9b90 src_device_num=8 dest=(nil) dest_device_num=0 bytes=400000 code=0x7f0e11418bd1 Callback DataOp EMI: endpoint=2 optype=1 target_task_data=0x175a658 (0x0) target_data=0x7f0e0ff4c368 (0x8000000000000001) host_op_id=0x7f0e0ff4c380 (0x8000000000000002) src=0x7fffca5c9b90 src_device_num=8 dest=0x7f0e08a00000 dest_device_num=0 bytes=400000 code=0x7f0e11418bd1 Callback DataOp EMI: endpoint=1 optype=1 target_task_data=0x175a658 (0x0) target_data=0x7f0e0ff4c368 (0x8000000000000001) host_op_id=0x7f0e0ff4c380 (0x8000000000000004) src=0x7fffca568110 src_device_num=8 dest=(nil) dest_device_num=0 bytes=400000 code=0x7f0e11418bd1 Callback DataOp EMI: endpoint=2 optype=1 target_task_data=0x175a658 (0x0) target_data=0x7f0e0ff4c368 (0x8000000000000001) host_op_id=0x7f0e0ff4c380 (0x8000000000000004) src=0x7fffca568110 src_device_num=8 dest=0x7f0e08a62000 dest_device_num=0 bytes=400000 code=0x7f0e11418bd1
So, this will only have an effect on DataOp EMI callbacks where
endpoint=2
andoptype=1
(alloc). Is this an acceptable solution / the anticipated output?
That looks perfect! With this, we should be able to use the same code in Score-P we've been using until now.
To be honest, I misread the output of NVHPC. There's no difference. Sorry for the confusion.
No worries, I just wanted to make sure I understood the situation.
That looks perfect! With this, we should be able to use the same code in Score-P we've been using until now.
That's great to hear!
I have another question, since I'm thinking about adapting the corresponding OMPT (EMI) testcases:
Should dest=
always report a non-null value when endpoint=2 optype=1
?
I have another question, since I'm thinking about adapting the corresponding OMPT (EMI) testcases: Should dest= always report a non-null value when
endpoint=2 optype=1
?
Looking at the OpenMP specifications, we should see the data address after the operation has finished. In this case, it would be the allocation of data. The specification allows data aggregation to reduce the number of callbacks though, which means that we may see less ompt_callback_target_data_op
calls than variables copied to the target device.
The only case where I wouldn't expect a pointer to be returned in the callback is when the allocation fails for some reason (for example insufficient memory). Maybe there's another case I haven't thought of.
Thanks for the quick response!
With that info, I guess I'll check that there are no null values:
/// CHECK: Callback DataOp EMI: endpoint=2 optype=1
/// CHECK-NOT: dest=(nil)
Phabricator review is up: https://reviews.llvm.org/D157996
@Thyre I took the liberty to directly subscribe you to the review :)
Description
Recently, LLVM has added parts of the target callbacks of the OMPT interface. During tests, I found a regression compared to the implementation previously found in ROCm and aomp.
The callback
ompt_callback_target_data_op
is called when memory is allocated on a selected target device. Theoptype
matchesompt_target_data_alloc
. We get the number of bytes allocated, but do not receive the allocated pointer both duringompt_scope_begin
orompt_scope_end
in the_emi
callbacks. Instead, both pointers have a value of 0 when usingomp_target_alloc
. When using#pragma omp target enter data map([...])
the fieldsrc_addr
is set to the host pointer, but we still do not get the device pointer. The pointer is correctly set on data operations and during the delete operation.It's worth noting that the OpenMP specifications do not specifically state that those pointers need to be passed to the callbacks. However, without those pointers, tools have a hard time tracking memory allocations correctly, only knowing the amount of memory.
Other runtimes (NVHPC, ROCm) solve this issue by passing the allocated pointer during
ompt_target_data_alloc
withendpoint = ompt_scope_end
Note: The callback
ompt_callback_target_data_op
also doesn't pass the pointer to the tools interface. However, since the callback is dispatched before the actual allocation I wouldn't necessarily consider this as an issue. ROCm and aomp have dispatched the callbacks the same way. Only NVHPC somehow knows the allocated pointer already and passes it in both cases.Reproducer
The following code can be used to reproduce the issue. The OMPT interface was mostly copied from an aomp smoke test with small changes to prevent the tool to abort on
omp_target_alloc
.Running the tool with Clang, we see the following output:
Notice that the field
dest
stays(nil)
for the whole allocation process. This isn't the case with other runtimes:ROCm 5.6:
aomp 17.0-3:
Both ROCm and aomp are do not dispatch
ompt_callback_target
for#pragma omp target [enter|exit] data
correctly, but the data operations contain the pointer during allocation.NVHPC 23.7:
The passed pointers in NVHPC look a bit weird, but in general, pointers are passed to the callbacks.