llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.9k stars 11.51k forks source link

[OpenMP][AArch64] mapping/target_derefence_array_pointrs.cpp failing #69606

Open ceseo opened 10 months ago

ceseo commented 10 months ago

This test fails with/without LTO on the current main:

# | /home/GitHub/llvm/llvm-project/openmp/libomptarget/test/mapping/target_derefence_array_pointrs.cpp:28:12: error: CHECK: expected string not found in input
# |  // CHECK: 3
# |            ^
# | <stdin>:1:103: note: scanning from here
# | Libomptarget message: explicit extension not allowed: host address specified is 0x0000ffffd8efdd80 (12 bytes), but device allocation maps to host at 0x0000ffffd8efdd80 (8 bytes)
# |                                                                                                       ^
# | <stdin>:1:104: note: possible intended match here
# | Libomptarget message: explicit extension not allowed: host address specified is 0x0000ffffd8efdd80 (12 bytes), but device allocation maps to host at 0x0000ffffd8efdd80 (8 bytes)

Looking at the code:

void foo(int **t1d) {
  int ***t2d = &t1d;
  int ****t3d = &t2d;
  *t1d = (int *)malloc(3 * sizeof(int));
  int j, a = 0, b = 0;

  for (j = 0; j < 3; j++)
    (*t1d)[j] = 0;
#pragma omp target map(tofrom : (*t1d)[0 : 3])
  { (*t1d)[1] = 1; }
  // CHECK: 1
  printf("%d\n", (*t1d)[1]);
#pragma omp target map(tofrom : (**t2d)[0 : 3])
  { (**t2d)[1] = 2; }
  // CHECK: 2
  printf("%d\n", (**t2d)[1]);
#pragma omp target map(tofrom : (***t3d)[0 : 3])
  { (***t3d)[1] = 3; }
  // CHECK: 3
  printf("%d\n", (***t3d)[1]);
#pragma omp target map(tofrom : (**t1d))
  { (*t1d)[0] = 4; }
  // CHECK: 4
  printf("%d\n", (*t1d)[0]);
#pragma omp target map(tofrom : (*(*(t1d + a) + b)))
  { *(*(t1d + a) + b) = 5; }
  // CHECK: 5
  printf("%d\n", *(*(t1d + a) + b));
}

Looks like ****t3d is clashing with ***t2d.

Is this testcase correct?

*Note: I cannot reproduce this on x86.

llvmbot commented 10 months ago

@llvm/issue-subscribers-openmp

Author: Carlos Eduardo Seo (ceseo)

This test fails with/without LTO on the current `main`: ``` # | /home/GitHub/llvm/llvm-project/openmp/libomptarget/test/mapping/target_derefence_array_pointrs.cpp:28:12: error: CHECK: expected string not found in input # | // CHECK: 3 # | ^ # | <stdin>:1:103: note: scanning from here # | Libomptarget message: explicit extension not allowed: host address specified is 0x0000ffffd8efdd80 (12 bytes), but device allocation maps to host at 0x0000ffffd8efdd80 (8 bytes) # | ^ # | <stdin>:1:104: note: possible intended match here # | Libomptarget message: explicit extension not allowed: host address specified is 0x0000ffffd8efdd80 (12 bytes), but device allocation maps to host at 0x0000ffffd8efdd80 (8 bytes) ``` Looking at the code: ``` void foo(int **t1d) { int ***t2d = &t1d; int ****t3d = &t2d; *t1d = (int *)malloc(3 * sizeof(int)); int j, a = 0, b = 0; for (j = 0; j < 3; j++) (*t1d)[j] = 0; #pragma omp target map(tofrom : (*t1d)[0 : 3]) { (*t1d)[1] = 1; } // CHECK: 1 printf("%d\n", (*t1d)[1]); #pragma omp target map(tofrom : (**t2d)[0 : 3]) { (**t2d)[1] = 2; } // CHECK: 2 printf("%d\n", (**t2d)[1]); #pragma omp target map(tofrom : (***t3d)[0 : 3]) { (***t3d)[1] = 3; } // CHECK: 3 printf("%d\n", (***t3d)[1]); #pragma omp target map(tofrom : (**t1d)) { (*t1d)[0] = 4; } // CHECK: 4 printf("%d\n", (*t1d)[0]); #pragma omp target map(tofrom : (*(*(t1d + a) + b))) { *(*(t1d + a) + b) = 5; } // CHECK: 5 printf("%d\n", *(*(t1d + a) + b)); } ``` Looks like `****t3d` is clashing with `***t2d`. Is this testcase correct? *Note: I cannot reproduce this on x86.
jdoerfert commented 10 months ago

Can you describe your platform. Is this host offloading? @doru1004 Did your map checks cause this?

doru1004 commented 10 months ago

@doru1004 Did your map checks cause this?

My patch is still in review, hasn't landed yet.

ceseo commented 10 months ago

Can you describe your platform. Is this host offloading? @doru1004 Did your map checks cause this?

AArch64, Ampere Neoverse N1.

ceseo commented 7 months ago

I did some digging on this and found out two things:

  1. This doesn't fail at any optimization level above -O0.
  2. This doesn't fail if the variables in the test are 64-bit integers.

Looking at the IR, the only relevant difference between x86_64 and aarch64 is data alignment.

Any ideas about where to look?

ceseo commented 6 months ago

I think I figured out what's going on.

The test executes the functions in this order:

int main() {
  int *data = 0;
  foo(&data);
  bar();
  xoo();
  yoo(&data);
}

foo() (below) executes correctly.

void foo(int **t1d) {
  int ***t2d = &t1d;
  int ****t3d = &t2d;
  *t1d = (int *)malloc(3 * sizeof(int));
  int j, a = 0, b = 0;

  for (j = 0; j < 3; j++)
    (*t1d)[j] = 0;
#pragma omp target map(tofrom : (*t1d)[0 : 3])
  { (*t1d)[1] = 1; }
  // CHECK: 1
  printf("%d\n", (*t1d)[1]);
#pragma omp target map(tofrom : (**t2d)[0 : 3])
  { (**t2d)[1] = 2; }
  // CHECK: 2
  printf("%d\n", (**t2d)[1]);
#pragma omp target map(tofrom : (***t3d)[0 : 3])
  { (***t3d)[1] = 3; }
  // CHECK: 3
  printf("%d\n", (***t3d)[1]);
#pragma omp target map(tofrom : (**t1d))
  { (*t1d)[0] = 4; }
  // CHECK: 4
  printf("%d\n", (*t1d)[0]);
#pragma omp target map(tofrom : (*(*(t1d + a) + b)))
  { *(*(t1d + a) + b) = 5; }
  // CHECK: 5
  printf("%d\n", *(*(t1d + a) + b));
}

Mapping after foo() looks like:

omptarget --> OpenMP Host-Device pointer mappings after block at target_derefence_array_pointrs.cpp:37:1:
omptarget --> Host Ptr           Target Ptr         Size (B) DynRefCount HoldRefCount Declaration
omptarget --> 0x0000ffffffffe980 0x0000aaaaaab5e1a0 8        1           0            *t3d at target_derefence_array_pointrs.cpp:15:11
omptarget --> 0x0000ffffffffe988 0x0000aaaaaab40e50 8        2           0            *t2d at target_derefence_array_pointrs.cpp:14:10
omptarget --> 0x0000ffffffffe9b8 0x0000aaaaaab39920 8        5           0            *t1d at target_derefence_array_pointrs.cpp:13:16
5

When bar() is executed, the mapping for *t3d is not freed. (**a)[:3] is mapped to the same address, so there is a clash.

omptarget --> Entering OpenMP kernel at target_derefence_array_pointrs.cpp:52:1 with 2 arguments:
omptarget --> tofrom(*a)[8]
omptarget --> tofrom((**a)[:3])[12]
omptarget --> Entry  0: Base=0x0000ffffffffe978, Begin=0x0000ffffffffe978, Size=8, Type=0x23, Name=*a
omptarget --> Entry  1: Base=0x0000ffffffffe978, Begin=0x0000ffffffffe980, Size=12, Type=0x13, Name=(**a)[:3]

Since the data size is larger, getTargetPointer will fail with the explicit extension not allowed error.

omptarget --> Looking up mapping(HstPtrBegin=0x0000ffffffffe980, Size=12)...
omptarget --> WARNING: Pointer is already mapped but section extends beyond mapped region
omptarget message: explicit extension not allowed: host address specified is 0x0000ffffffffe980 (12 bytes), but device allocation maps to host at 0x0000ffffffffe980 (8 bytes)

I don't know yet why this only happens on AArch64, though.

shiltian commented 6 months ago

On a side unrelated note (probably to me), we could potentially "track" this issue by enabling OpenMP target offloading on macOS, only for the generic plugin, not GPU version, given we have many Apple Silicon macOS users now.

jhuber6 commented 6 months ago

On a side unrelated note (probably to me), we could potentially "track" this issue by enabling OpenMP target offloading on macOS, only for the generic plugin, not GPU version, given we have many Apple Silicon macOS users now.

This will require some work in the clang-linker-wrapper. Someone will need to add something like in https://stackoverflow.com/a/22366882 to where we handle this for ELF and COFF. Additionally, we will need to manually strip the .llvm.offloading section because the darwin linker does not support SHF_EXCLUDE type flags which lets the linker delete them for you. I think we'd need to name the output section something different and then just run llvm-objcopy --delete-section .llvm.offloading or something.