oneapi-src / oneTBB

oneAPI Threading Building Blocks (oneTBB)
https://oneapi-src.github.io/oneTBB/
Apache License 2.0
5.68k stars 1.02k forks source link

std::sort(par_unseq ,...) leaks memory when called repeatedly #1533

Open oschonrock opened 4 days ago

oschonrock commented 4 days ago

Summary

When std::sort(std::execution::par_unseq,...) is called repeatedly it leaks memory. Ultimately it will exhaust the machine's memory and the kernel will kill the process.

Single threaded sort does not leak.

sample code with precise instructions to reproduce is here: https://github.com/oschonrock/tbbleak

tested on Ubuntu 24.04: exact details at repo above discovered using libtbb-dev:amd64 2021.11.0-2ubuntu2 package from ubuntu repos

also confirmed by compiling the head of https://github.com/oneapi-src/oneTBB, ie this commit https://github.com/oneapi-src/oneTBB/commit/42b833fe806606d05a5cad064b8b87365818d716

compiled with gcc-13 and clang-18 (both from ubuntu 24.04)

Observed Behavior

std::sort(par_unseq,.... ) leaks as shown by output from above demo code

Expected Behavior

std::sort(par_unseq,.... ) does not leak

Maybe Related

A very similar issue was reported here in June 2024: https://community.intel.com/t5/Intel-oneAPI-Threading-Building/std-sort-std-execution-par-unseq-has-a-memory-leak-on-Linux/m-p/1582773

and it was solved here:

https://github.com/oneapi-src/oneDPL/pull/1589

but on the oneDPL codebase.

Valgrind

Running:

/build.sh && valgrind ./build/gcc/relwithdebinfo/sort_leak  # see tbbleak repo above

shows some leaks. If the call to std::sort(par_unseq,..) is commented out, the leaks go away

ASAN

When Address sanitizer is run as follows:

./build.sh debug && ./build/gcc/debug/sort_leak  # see other repo above

we get some runtime errors complaining about misaligned addresses (these are not reported when std::sort(par_unseq,...) is commented out)

/usr/include/c++/13/pstl/parallel_backend_tbb.h:515:7: runtime error: member access within misaligned address 0x76af91a090e0 for type 'struct __task', which requires 64 byte alignment
0x76af91a090e0: note: pointer points here
 00 00 00 00  f8 98 bd 34 b3 59 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00

... 

/usr/include/c++/13/pstl/parallel_backend_tbb.h:675:7: runtime error: reference binding to misaligned address 0x76af91a090e0 for type 'struct __as_base ', which requires 64 byte alignment
0x76af91a090e0: note: pointer points here
 00 00 00 00  28 99 bd 34 b3 59 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
kboyarinov commented 4 days ago

@oschonrock thanks for reporting that. As you mentioned, this issue was already fixed in the oneDPL codebase and it was caused by incorrect usage of oneTBB as part of a backend in oneDPL. We cannot fix it in oneTBB since this code does not belong to oneTBB.

If you are using standard parallel algorithms from oneDPL library, consider simply updating the oneDPL to the latest available version. If you are using parallel algorithms directly from the libstdc++ or libc++ implementations, it would be complicated to receive a fix since it was not yet upstreamed. You can either apply the patch from https://github.com/oneapi-src/oneDPL/pull/1589 to your local STL files or submit an issue directly to the STL vendor to speed up the upstreaming from oneDPL to STL.

oschonrock commented 4 days ago

Thanks for you quick response. I have to be honest. From a user perspective it is extremely confusing who owns what code here, and how they interact. oneTBB, oneDPL. libstdc++ / gcc etc

I am using libstdc++ , ie the standard library that is distributed with gcc. And simply calling:

std::sort(std::execution::par-unseq,...

This has never worked without installing an external library from Intel called "TBB" and linking against it. I believe gcc's libstdc++ STL implementation just relies on it. The mechanism of installing "TBB" has changed over the years. It used to require some separate download direct from Intel, but these days it is as simple as

sudo apt install libtbb-dev

And that is how we have run this for years.

It was not clear to me where the patch from oneDPL fits. From what you are saying it is in the "interface glue code" between libstd++ and TBB, but under control of libstdc++?

I went hunting through my current installation(s) of libstdc++, indeed found the parallel_backend_tbb.h and manually and very carefully applied that patch from oneDPL (https://github.com/oneapi-src/oneDPL/pull/1589/files) - the patch won't apply automatically due to different assert macros etc:

This is the resulting diff

--- /usr/include/c++/13/pstl/parallel_backend_tbb.h.orig 2024-10-23 15:56:29.035725777 +0000
+++ /usr/include/c++/13/pstl/parallel_backend_tbb.h     2024-10-23 16:12:13.932914691 +0000
@@ -511,7 +511,7 @@
     friend class __func_task<_Func>;
 };

-#else  // TBB_INTERFACE_VERSION <= 12000
+#else  // TBB_INTERFACE_VERSION > 12000
 class __task : public tbb::detail::d1::task
 {
   protected:
@@ -646,10 +646,15 @@

         _PSTL_ASSERT(__parent != nullptr);
         _PSTL_ASSERT(__parent->_M_refcount.load(std::memory_order_relaxed) > 0);
-        if (--__parent->_M_refcount == 0)
+        auto __refcount = --__parent->_M_refcount;
+
+        // Placing the deallocation after the refcount decrement allows another thread to proceed with tree
+        // folding concurrently with this task cleanup.
+        __alloc.deallocate(this, *__ed);
+
+        if (__refcount == 0)
         {
             _PSTL_ASSERT(__next == nullptr);
-            __alloc.deallocate(this, *__ed);
             return __parent;
         }

I reran my tests on this repo: https://github.com/oschonrock/tbbleak and it does indeed solve the problem.

It also works with the TBB version in the ubuntu repos ie 2021.11.0-2ubuntu2

The above patch does apply automatically now. And it also applies to

/usr/include/c++/14/pstl/parallel_backend_tbb.h

which I believe is the latest available from ubuntu repos.

The libstdc++ mirror repo seems to show that this remains unpatched in the current master of libstd++

https://github.com/gcc-mirror/gcc/blob/4b0f238855f8fa79acf7cca84b523ca8513bf68d/libstdc%2B%2B-v3/include/pstl/parallel_backend_tbb.h#L659

I will figure out how to file a bug on that, or confirm one is filed and try to get this corrected.

Many thanks for your help.

oschonrock commented 4 days ago

bug submitted on libstdc++

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117276