Open ye-luo opened 5 years ago
On the host we generate a
omp for
like loop for thedistribute
which binds to the outer parallel. Thus, each thread of the outer parallel executes only a single iteration of theA[i] = i
loop instead of the entire thing.If
omp for
like loop is generated fordistribute
,omp parallel
like region should be generated forteams
. binding to the outer parallel is clearly wrong.
Agreed.
On the host we generate a
omp for
like loop for thedistribute
which binds to the outer parallel. Thus, each thread of the outer parallel executes only a single iteration of theA[i] = i
loop instead of the entire thing.
If omp for
like loop is generated for distribute
, omp parallel
like region should be generated for teams
. binding to the outer parallel is clearly wrong.
On the host we generate a omp for
like loop for the distribute
which binds to the outer parallel. Thus, each thread of the outer parallel executes only a single iteration of the A[i] = i
loop instead of the entire thing.
I tried compiling this without offload $clang++ -fopenmp debug.cpp $ OMP_NUM_THREADS=2 ./a.out tid = 0 0 1 0 0 tid = 1 0 0 2 3 Probably something is wrong with libomp already.
After some investigation, seems to me it is the problem in libomp. When we schedule the distribute loop, the tid is taken from the outer threads (though it should set to 0 in all cases) and the number of threads is taken from the outer parallel region (though seems to me, it should be set to 1). It would be good to check if libomp works correctly here.
The problem caused by kmpc_push_target_tripcount function. It is not threadsafe and has data race, which causes incorrect results.
Also, seems to me, there is a problem with the runtime, will investigate this.
The problem caused by kmpc_push_target_tripcount function. It is not threadsafe and has data race, which causes incorrect results.
Extended Description
include
include
The attached code gives correct results with OMP_TARGET_OFFLOAD=MANDATORY but wrong results with OMP_TARGET_OFFLOAD=DISABLED
OMP_NUM_THREADS=2 OMP_TARGET_OFFLOAD=MANDATORY ./a.out tid = 1 0 1 2 3 tid = 0 0 1 2 3
$ OMP_NUM_THREADS=2 OMP_TARGET_OFFLOAD=DISABLED ./a.out tid = 0 0 1 0 0 tid = 1 0 0 2 3
int main() { const int size = 4;
pragma omp parallel
{ int A[size]; for(int i = 0; i < size; i++) A[i] = 0;
} }