Open Meinersbur opened 2 years ago
@llvm/issue-subscribers-openmp
I can get one failure out of about 100+ runs. I'll defer the fix until we get other reports that are more easier to be reproduced.
I think this problem is more likely to be about bugs in existing tasking implementation. I made some changes to the function BlockMatMul_TargetNowait
:
int BlockMatMul_TargetNowait(BlockMatrix &A, BlockMatrix &B, BlockMatrix &C) {
#pragma omp parallel
{
int tid = omp_get_thread_num();
if (tid == 0) {
for (int i = 0; i < N / BS; ++i)
for (int j = 0; j < N / BS; ++j) {
int *BlockC = C.GetBlock(i, j);
for (int k = 0; k < N / BS; ++k) {
int *BlockA = A.GetBlock(i, k);
int *BlockB = B.GetBlock(k, j);
#pragma omp task depend(in : BlockA[0], BlockB[0]) depend(inout : BlockC[0])
{
// clang-format off
#pragma omp target map(to: BlockA[:BS * BS], BlockB[:BS * BS]) \
map(tofrom: BlockC[:BS * BS])
// clang-format on
#pragma omp parallel for
for (int ii = 0; ii < BS; ii++)
for (int jj = 0; jj < BS; jj++)
for (int kk = 0; kk < BS; ++kk)
BlockC[ii + jj * BS] +=
BlockA[ii + kk * BS] * BlockB[kk + jj * BS];
}
}
}
}
}
return 0;
}
No nowait
is used, which means no hidden helper task is involved. Pure regular OpenMP task now, and I can still observe failure.
I can get one failure out of about 100+ runs. I'll defer the fix until we get other reports that are more easier to be reproduced.
With previous failures of bug49334, the following increased how often the problem occured:
N
The builder ran without failure for ~6 days (http://meinersbur.de:8011/#/builders/11 and now https://lab.llvm.org/staging/#/builders/155/builds/2424) after the last problem was fixed, and now failing regularly. A recent commit might have introduced the problem.
I bisected the problem which resulted in 75812e7704fcb7cf89e107cc773c5a2118f380ab. Cherry-picking it on top of 092a5bb72ba8cc5a6cec02cfe61f70130a2c1282 also shows the error, so the issue actually was always there.
Like I mentioned https://github.com/llvm/llvm-project/issues/54081#issuecomment-1051093259, it could be possible that the issue is in existing tasking implementation.
The test is disabled for now. Remove myself in case anyone else is interested.
First appearing in https://lab.llvm.org/staging/#/builders/155/builds/2424 https://lab.llvm.org/staging/#/builders/154/builds/1838
The failure is flaky, so the problem might have been introduced earlier already.
Previous failures of bug49334: #53730 #50083 #49284