llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.58k stars 11.81k forks source link

Target region failure when using is_device_ptr with class member pointer #41536

Open cdaley opened 5 years ago

cdaley commented 5 years ago
Bugzilla Link 42191
Version unspecified
OS Linux
CC @alexey-bataev,@cdaley,@jdoerfert

Extended Description

Hello all,

I encounter a target region failure when using a class member pointer in the is_device_ptr clause.

clang++ -UFUNC_DECL -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda is_device_ptr.cpp -o is_device_ptr && srun -n 1 ./is_device_ptr Libomptarget fatal error 1: failure of target construct while offloading is mandatory srun: error: cgpu01: task 0: Exited with exit code 1 srun: Terminating job step 156071.85

The target region is successful when I declare the pointer inside the C++ function instead (using the macro FUNC_DECL).

clang++ -DFUNC_DECL -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda is_device_ptr.cpp -o is_device_ptr && srun -n 1 ./is_device_ptr The correct answer is 2....2

I think this is a bug but I am happy to be corrected. The code is as follows.

cat is_device_ptr.cpp

include

include

include

class Widget { public:

ifndef FUNC_DECL

int _size; int _hnum; int _dnum; double* d_ptr; std::vector data;

endif

Widget(int size) {

ifdef FUNC_DECL

int _size;
int _hnum;
int _dnum;
double* d_ptr;
std::vector<double> data;

endif

_size = size;
_dnum = omp_get_default_device();
_hnum = omp_get_initial_device();
d_ptr = static_cast<double*>(omp_target_alloc(size*sizeof(double), _dnum));
data.resize(size, 1.0);

omp_target_memcpy(d_ptr, data.data(), size, 0, 0, _dnum, _hnum);
double a = 2.0;

pragma omp target is_device_ptr(d_ptr)

pragma omp teams distribute parallel for

for(int i=0; i<size; i++) {
  d_ptr[i] *= a;
}

omp_target_memcpy(data.data(), d_ptr, size, 0, 0, _hnum, _dnum);
omp_target_free(d_ptr, _dnum);
std::cout << "The correct answer is 2...." << data[0] << '\n';

} };

int main() { Widget widget(100000); return 0; }

I am using the following version of Clang

clang++ -v clang version 9.0.0 (https://github.com/llvm-mirror/clang c65fb39b929fc1a0ed7bbb93bab40fe53086cbd7) (https://github.com/llvm-mirror/llvm 891c55ccc57a9dd7db74e105bf02e70b3151b9f6) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /project/projectdirs/m1759/csdaley/software/cgpu/llvm/9.0.0-git_20190606/bin Found candidate GCC installation: /opt/gcc/7.3.0/snos/lib/gcc/x86_64-suse-linux/7.3.0 Selected GCC installation: /opt/gcc/7.3.0/snos/lib/gcc/x86_64-suse-linux/7.3.0 Candidate multilib: .;@m64 Selected multilib: .;@m64 Found CUDA installation: /global/common/cori/software/cuda/9.2, version 9.2

Thanks, Chris

cdaley commented 5 years ago

I think, there is a problem in your code when d_ptr is a member data. You need to map the whole object and transfer it to the device or allocate memory for it on the device. Without it, when you're trying to access d_ptr on the GPU, you're dereferencing this pointer, which is nullptr (since d_ptr is a member, you need at first to dereference this pointer to get access to d_ptr).

Thanks Alexey,

Can you please show me the code that should be used when d_ptr is member data?

Is this what you mean?

pragma omp target enter data map(to:this[0:1])

pragma omp target is_device_ptr(this->d_ptr)

pragma omp teams distribute parallel for

for(int i=0; i<size; i++) {
  d_ptr[i] *= a;
}
  • This code fails in the same way.

Thanks, Chris

Just this would be enough:

pragma omp target map(to:this[0:1])

pragma omp teams distribute parallel for

for(int i=0; i<size; i++) {
  d_ptr[i] *= a;
}

Thank you. This works as we expect. Please close this bug report if you are confident that "is_device_ptr" should not work on member data pointers.

Thanks, Chris

alexey-bataev commented 5 years ago

I think, there is a problem in your code when d_ptr is a member data. You need to map the whole object and transfer it to the device or allocate memory for it on the device. Without it, when you're trying to access d_ptr on the GPU, you're dereferencing this pointer, which is nullptr (since d_ptr is a member, you need at first to dereference this pointer to get access to d_ptr).

Thanks Alexey,

Can you please show me the code that should be used when d_ptr is member data?

Is this what you mean?

pragma omp target enter data map(to:this[0:1])

pragma omp target is_device_ptr(this->d_ptr)

pragma omp teams distribute parallel for

for(int i=0; i<size; i++) {
  d_ptr[i] *= a;
}
  • This code fails in the same way.

Thanks, Chris

Just this would be enough:

pragma omp target map(to:this[0:1])

pragma omp teams distribute parallel for

for(int i=0; i<size; i++) {
  d_ptr[i] *= a;
}
cdaley commented 5 years ago

I think, there is a problem in your code when d_ptr is a member data. You need to map the whole object and transfer it to the device or allocate memory for it on the device. Without it, when you're trying to access d_ptr on the GPU, you're dereferencing this pointer, which is nullptr (since d_ptr is a member, you need at first to dereference this pointer to get access to d_ptr).

Thanks Alexey,

Can you please show me the code that should be used when d_ptr is member data?

Is this what you mean?

pragma omp target enter data map(to:this[0:1])

pragma omp target is_device_ptr(this->d_ptr)

pragma omp teams distribute parallel for

for(int i=0; i<size; i++) {
  d_ptr[i] *= a;
}

Thanks, Chris

alexey-bataev commented 5 years ago

I think, there is a problem in your code when d_ptr is a member data. You need to map the whole object and transfer it to the device or allocate memory for it on the device. Without it, when you're trying to access d_ptr on the GPU, you're dereferencing this pointer, which is nullptr (since d_ptr is a member, you need at first to dereference this pointer to get access to d_ptr).