llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.37k stars 11.71k forks source link

Assertion failure in kmp_dispatch #54422

Open ghost opened 2 years ago

ghost commented 2 years ago

I have observed an assertion failure in openmp, while running a benchmark. I have extracted a reproducer which triggers the assert reliably.

void main (int argc, char *argv[]) { long __trans_tmp_1 = 4.0; omp_set_num_threads (__trans_tmp_1); while(1) {

pragma omp parallel for schedule(dynamic)

    for (long pidx = 0; pidx < 10.0; pidx++)
      ;

} }

I have compiled that code with bin/clang -fopenmp -fopenmp-version=50 -mcpu=native kmp_assert_static_steal_reproducer.c -o kmp_assert_static_steal_reproducer This was tested on AArch64, i don't know if it shows on other platforms. The error message is as follows:

OMP: Error #13: Assertion failure at kmp_dispatch.cpp(1298). OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/. Aborted (core dumped)

As far as I can see, this is one of several assert which have been revealed after the following patch landed: https://reviews.llvm.org/D103648 Finally, i observed while making the reproducer that the types are important to the issue. When I changed the long type to int/float, i could no longer see the issue.

llvmbot commented 2 years ago

@llvm/issue-subscribers-openmp

ebatsin commented 2 years ago

I also have the same assertion failure as @alban-bridonneau-arm but with a different piece of code.

Sadly, I couldn't easily create a small reproducer (note: the reproducer above does not trigger any assertion on my machine).

Here is the the piece of code that fails. I'll try to extract it from my code-base and try to make it independent from PCL / FLANN when I have more time:

pcl::KdTreeFLANN<pcl::PointXYZ>::Ptr kdTree;

/** code that initializes kdTree **/

#pragma omp parallel for schedule(dynamic, 1000) num_threads(d->threads)
for(int i = 0; i < requestedPoints.size(); ++i) {
    std::vector<int> nn_indices;
    std::vector<float> nn_sqr_dists;
        // requestedPoint is a pcl::PointCloud<pcl::PointXYZ>& that is passed to the function this code snippet is in
    kdTree->radiusSearch(requestedPoints[i], 0.2, nn_indices, nn_sqr_dists);
}

OpenMP uses the options defined by using target_link_libraries(myAppPRIVATE OpenMP::OpenMP_CXX) in CMake

When run, I get the following assert:

Assertion failure at kmp_dispatch.cpp(1343): victim. OMP: Error #13: Assertion failure at kmp_dispatch.cpp(1343). OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.

This code fails on the following configs:

Config 1 OS: Ubuntu 21.10 compiler: clang 12.0.1-8build1 (target: x86_64-pc-linux-gnu) linker: LLD 12.0.1 CPU: AMD Ryzen 9 5900X

Config 2 OS: Ubuntu 20.04.4 LTS VM compiler: clang 12.0.1-++20211029101322+fed41342a82f-1~exp1~20211029221816.4 (target: x86_64-pc-linux-gnu) linker: LLD 12.0.1 CPU: AMD Epyc 7413 through VMWare

Config 3 OS: Arch Linux compiler: clang 12. (No longer have the codebase on that machine and am now on clang 14, so I can't get the build number of the clang versions on which I noticed the issue) linker: LDD 12 CPU: Intel i7-8700

q-p commented 2 years ago

I am seeing the same assertion (clang 14.0.6) in kmp_dispatch.cpp:1298 (https://github.com/llvm/llvm-project/blob/llvmorg-14.0.6/openmp/runtime/src/kmp_dispatch.cpp#L1298 ) as @alban-bridonneau-arm on an M1 Macbook Pro with some of our OpenMP code.

q-p commented 2 years ago

For us the assertion is triggered via the following OpenMP pragma in Alpaka https://github.com/alpaka-group/alpaka/blob/0.6.1/include/alpaka/kernel/TaskKernelCpuOmp2Blocks.hpp#L324

q-p commented 2 years ago

And I can confirm that @alban-bridonneau-arm's reproducer also fails on macOS (MacBook Pro (14-inch, 2021), Apple M1 Pro CPU) if run with at least 3 threads, e.g.

int main (int argc, char *argv[])
{
  while(1)
  {
#pragma omp parallel for schedule(dynamic)
    for (long pidx = 0; pidx < 10; pidx++)
      ;
  }
}
  1. /opt/homebrew/opt/llvm/bin/clang++ -fopenmp kmp_repo.cpp
  2. OMP_NUM_THREADS=8 ./a.out

OMP: Error #13: Assertion failure at kmp_dispatch.cpp(1298). OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.

Increasing the number of the threads seems to trigger the race faster. One or two threads seem to not trigger the bug.

leedrake5 commented 1 year ago

@q-p @alban-bridonneau-arm See here for a similar problem with the M1Ultra in the Mac Studio. Note that initial symptom was partial utilization due to other services. This was followed by a crash, then the following error:

OMP: Error #13: Assertion failure at kmp_dispatch.cpp(1298).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
zsh: abort R

Reproducible code:

require(xgboost)
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test

bstSparse <- xgboost(data = train$data, label = train$label, max.depth = 20, eta = 1, nthread = -1, nrounds = 2000000, objective = "binary:logistic")

Just a matter of time before it triggers the error.

Edit - here's the underlying code which reaches out to OpenMP for the above example:

inline int32_t OmpGetNumThreads(int32_t n_threads) {
  if (n_threads <= 0) {
    n_threads = std::min(omp_get_num_procs(), omp_get_max_threads());
  }
  n_threads = std::min(n_threads, OmpGetThreadLimit());
  n_threads = std::max(n_threads, 1);
  return n_threads;
}
nawrinsu commented 1 year ago

@alban-bridonneau-arm Could you please share the compiler and OS version?

alban-bridonneau commented 1 year ago

Hi, I can't find the exact commit we were using, that was a top of tree commit from the time that the bug was raised, so shortly before the LLVM14 release. LLVM itself was built with GCC 11.2.0. The OS was Ubuntu 18.04 I hope that helps, Alabn

q-p commented 1 year ago

For what it's worth (on macOS "Monterey" 12.6) on an Apple M1 Pro, and using my reproducer in https://github.com/llvm/llvm-project/issues/54422#issuecomment-1239527573 I can trigger the error using LLVM 14.0.6

> /opt/homebrew/opt/llvm@14/bin/clang++ -v
Homebrew clang version 14.0.6
Target: arm64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /opt/homebrew/opt/llvm@14/bin

> /opt/homebrew/opt/llvm@14/bin/clang++ -fopenmp -std=c++17 -L /opt/homebrew/opt/llvm@14/lib main.cpp

> OMP_NUM_THREADS=8 ./a.out
OMP: Error #13: Assertion failure at kmp_dispatch.cpp(1298).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
fish: Job 1, 'OMP_NUM_THREADS=8 ./a.out' terminated by signal SIGABRT (Abort)

but when using LLVM 15.0.3 the repo doesn't trigger the assert. Note using clang from 15.0.3 with libomp from 14.0.6 also triggers the assert, so the fix is clearly from some changes in the OpenMP library.

So from my observation it seems fixed, but I haven't yet tracked down any particular change that looks like it might do that...

leedrake5 commented 1 year ago

@q-p Unfortunately I'm still running into the assert with llvm 15.03 (though it takes longer to get there, M1 Ultra). I don't think it's entirely fixed, but it does seem to be mitigated.

nawrinsu commented 1 year ago

Posted a patch for review that fixes the bug - https://reviews.llvm.org/D139373

shiltian commented 1 year ago

I suppose https://reviews.llvm.org/D139373 fixed the issue. If not, feel free to reopen it.

leedrake5 commented 1 year ago

Still getting this frequently with libomp 15.0.7.

OMP: Error #13: Assertion failure at kmp_dispatch.cpp(1298). OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/. zsh: abort R

shiltian commented 1 year ago

@leedrake5 The patch was not back ported to LLVM 15.0.7. You could give 16 RC1 a shot.

leedrake5 commented 1 year ago

@shiltian Would be happy to - how do I do that? I normally install as brew install libomp

shiltian commented 1 year ago

Oh, then you will have to compile OpenMP. Clone the LLVM project, and check out to the release branch release/16.x, and then configure and build the project:

$ cmake -G Ninja -DCMAKE_BUILD_TYPE=Release --install-prefix=${SOMEWHERE} -B ${SOMEWHERE} -S ${path-to}/llvm-project/openmp
$ ninja install

And then set the corresponding environment variable such that the loader can find the library. You might also encounter something like unverified developer. I don't know how to deal with that, though you can allow it in system settings.

leedrake5 commented 1 year ago

Copy, will do that and run it through the tests. I appreciate the guidance.

leedrake5 commented 1 year ago

@shiltian I will keep an eye on it, but a machine learning algorithm successfully ran overnight on a M1 Ultra chip with 16 RC1, where it failed after an hour with 15.0.7. Much more stable build so far.

leedrake5 commented 1 year ago

So kept an eye on it, looks like it is emerging again with 16.0.3

OMP: Error #13: Assertion failure at kmp_dispatch.cpp(1298).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.

Edit - this may be about how R installs xgboost more than this project, testing this out.

Fails: install.packages("xgboost", type="source") Works:

cd ~/GitHub/xgboost
mkdir build
cd build
cmake .. CC=gcc, CXX=g++, -DR_LIB=ON
sudo make -j20
sudo make install
q-p commented 1 year ago

So kept an eye on it, looks like it is emerging again with 16.0.3

As far as I can tell, there are no changes between 16.0.2 and 16.0.3 in kmp_dispatch.cpp. Maybe you're seeing some other effect (or end up using the wrong runtime)?

guoyaol commented 12 months ago

I have this error, with llvm 16.0.6

AndreyChurbanov commented 11 months ago

I'd suggest adding an empty line in the code just before the assertion line. So that the assertion message change and people can figure out if they indeed use fresh library or inadvertently use some old version.

leedrake5 commented 10 months ago

This issue is still here with llvm 17.0.6. The workaround however makes me confused.

This happens when nthreads is -1 or any value > 1. The lower the number of threads, the more infrequent, but still present. However, setting nthreads=1 allows models to run. That said, there is still multicore behavior - I see all threads utilized despite specifically asking for single threaded behavior.

What makes the most sense to me is that somehow nthreads=1 has become nthreads=-1, and nthreads=-1 is a multiple of max nthreads, leading to doubling (or more) the demands made in parallel. It's like the implementation has indexed to maximum cores and counts up from there. I can't think of any other explanation that would leave nthreads=1 using up all my CPU, and persistent failures with any other value. It is very possible this is an XGBoost problem, but I've seen their OpenMP code and it makes sense to me. Maybe something else is happening? I really don't know.

quentin commented 6 months ago

I have this issue on Apple M1 with Xcode 14.2 and lib openmp from llvm 17.

The reproduction steps above (https://github.com/llvm/llvm-project/issues/54422#issuecomment-1239527573) I also noticed that using an int variable does not reproduce the bug. But using an int64_t does reproduce.

shiltian commented 6 months ago

It looks like this issue has not been resolved. Reopen it.

shiltian commented 6 months ago

Can anyone give me a small reproducer that I can try to debug it locally?

quentin commented 6 months ago

Copy of steps above:

int main (int argc, char *argv[])
{
  while(1)
  {
#pragma omp parallel for schedule(dynamic)
    for (long pidx = 0; pidx < 10; pidx++)
      ;
  }
}

build with openmp, and run:

OMP_NUM_THREADS=4 ./a.out

By my experience running this command twice in parallel increase the chance of producing the crash.

Replacing long by int "fixes" the issue on M1.

shiltian commented 6 months ago

Unfortunately I tried multiple values for OMP_NUM_THREADS (2, 4, 6, 8, 16) and ran each for 4-5 times, I never hit a crash on my M2 Ultra. Also tried 18 since M2 Ultra has 16P8E cores, so I was wondering maybe that could cause an issue but still didn't get any luck.

StefanAtev commented 2 months ago

I am not sure this is the exact same issue, but I am seeing an assertion in kmp_dispatch.cpp (with clang 15 and clang 18, I can eventually test in-between). It occurs randomly. Here is a test program that produces the assertion under Ubuntu 24.04 LTS. There is nothing special about the program, it's just a really crude approximation of how my application uses OpenMP and standard threads.

#include <iostream>
#include <thread>
#include <vector>
#include <omp.h>

void runner(int idx)
{
    int sink = 0;
    int iter = 0;
    while (sink != 17)
    {
        for (int tc = 1; tc < 2 * omp_get_max_threads(); ++tc)
        {
            int r = 0;
            const int cs = ((tc + idx) % 3) + 2;
            #pragma omp parallel num_threads(tc)
            {
                int tr = 0;
                #pragma omp for schedule(dynamic, cs) nowait
                for (int i = 0; i < 301; ++i)
                {
                    tr += i;
                }
                #pragma omp critical
                r += tr;
            }
            sink += r;
        }
        ++iter;
        std::cout << "Thread " << idx << " finished iter " << iter << " with result " << sink << std::endl;
    }
}

int main()
{
    std::vector<std::thread> launchers;
    for (int i = 0; i < 3; ++i)
        launchers.emplace_back(&runner, i);
    for (auto & l : launchers)
        l.join();
    return 0;
}

Here are two sample outputs with clang-15:

clang++-15 -fopenmp -O2 /mnt/d/Projects/omp_repro.cpp -o a.out
time ./a.out

Sample output 1:

Thread 2 finished iter 1 with result 1399650
Thread 0 finished iter 1 with result 1399650
Thread 2 finished iter 2 with result 2799300
Thread 1 finished iter 1 with result 1399650
...
Thread 1 finished iter 27 with result 37790550
Thread 2 finished iter 33 with result 46188450
Thread 0 finished iter 30 with result 41989500
Thread 1 finished iter 28 with result 39190200
Thread 2 finished iter 34 with result 47588100
Thread 0 finished iter 31 with result 43389150
Assertion failure at kmp_dispatch.cpp(1456): vnew.p.ub * (UT)chunk <= trip.
OMP: Error #13: Assertion failure at kmp_dispatch.cpp(1456).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
Aborted

real    0m39.347s
user    2m47.038s
sys     7m42.292s

Sample output 2:

Thread 2 finished iter 39 with result 54586350
Thread 1 finished iter 40 with result 55986000
Thread 2 finished iter 40 with result 55986000
Thread 0 finished iter 33 with result 46188450
Assertion failure at kmp_dispatch.cpp(1456): vnew.p.ub * (UT)chunk <= trip.
OMP: Error #13: Assertion failure at kmp_dispatch.cpp(1456).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
Aborted

real    0m53.228s
user    3m44.133s
sys     10m27.280s

Here are two sample outputs with clang-18 (Ubuntu clang version 18.1.3 (1ubuntu1)) : It seems to hit the same assertion (on a different line) much faster:

clang++-18 -fopenmp -O2 /mnt/d/Projects/omp_repro.cpp -o a.out
time ./a.out

Sample run 1:

...
Thread 2 finished iter 17 with result 23794050
Thread 0 finished iter 10 with result 13996500
Thread 1 finished iter 13 with result 18195450
Thread 2 finished iter 18 with result 25193700
Thread 0 finished iter 11 with result 15396150
Thread 1 finished iter 14 with result 19595100
Thread 2 finished iter 19 with result 26593350
Assertion failure at kmp_dispatch.cpp(1617): vnew.p.ub * (UT)chunk <= trip.
OMP: Error #13: Assertion failure at kmp_dispatch.cpp(1617).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://github.com/llvm/llvm-project/issues/.
Aborted

real    0m2.093s
user    0m9.602s
sys     0m23.533s

Sample run 2:

...
Thread 0 finished iter 3 with result 4198950
Thread 0 finished iter 4 with result 5598600
Thread 1 finished iter 4 with result 5598600
Assertion failure at kmp_dispatch.cpp(1617): vnew.p.ub * (UT)chunk <= trip.
OMP: Error #13: Assertion failure at kmp_dispatch.cpp(1617).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://github.com/llvm/llvm-project/issues/.
Aborted

real    0m1.096s
user    0m4.897s
sys     0m12.546s

For comparison, on same OS / machine, using gcc 13, the code runs noticeably faster and keeps running indefinitely:

g++-13 -fopenmp -O2 /mnt/d/Projects/omp_repro.cpp -o a.out

.... keeps going forever

I tried switching the schedule to static, guided, and monotonic:dynamic in an effort to work around the assertion, but I hit it all the time (I was hoping at least one schedule doesn't use the static_stealing scheduler). I am running under WSL, but the issue I am trying to reproduce is consistently occurring on a native Ubuntu box as well.

Any suggested work-arounds would be welcome.

nawrinsu commented 2 months ago

@StefanAtev It seems like a different issue. Could you provide the details of the processor used for the test?

StefanAtev commented 2 months ago

@StefanAtev It seems like a different issue. Could you provide the details of the processor used for the test?

These test results are from: Processor: 11th Gen Intel(R) Core(TM) i9-11950H @ 2.60GHz, 2611 Mhz, 8 Core(s), 16 Logical Processor(s)

It was also verified on: 13th Gen Intel(R) Core(TM) i7-13850HX, 2100 Mhz, 20 Core(s), 28 Logical Processor(s)

The same issue occurs on a server-class machine (dual Intel(R) Xeon(R) Gold 5317 CPU @ 3.00GHz). Basically, we are a mixed Win/Linux Intel shop, we just switched from targeting Ubuntu 20.04 with clang 9 to targeting 24.04 with clang 18 when the issue was observed during testing.

nawrinsu commented 2 months ago

@StefanAtev I'm not sure this is the exact same issue, but I have a patch (https://github.com/llvm/llvm-project/pull/97120) for review to fix a scheduler bug targeting hybrid systems (e.g., Raptor Lake). If possible could you please apply the patch and check if it resolves the issue.

StefanAtev commented 2 months ago

@StefanAtev I'm not sure this is the exact same issue, but I have a patch (#97120) for review to fix a scheduler bug targeting hybrid systems (e.g., Raptor Lake). If possible could you please apply the patch and check if it resolves the issue.

I can try, it will take a while to set up to build from sources, but at a first glance, the older machine tested and the server class machine (dual Intel(R) Xeon(R) Gold 5317 CPU @ 3.00GHz) don't have E cores, so I am not sure how the patch is related.