oneapi-src / unified-runtime

https://oneapi-src.github.io/unified-runtime/
Other
36 stars 115 forks source link

race condition in native cpu adaptor allocation functions #2070

Open ldrumm opened 1 month ago

ldrumm commented 1 month ago

There's a subtle race condition in get_alloc_info_entry. get_alloc_info_entry protects the set of known allocations by grabbing the mutex protecting the same, but then returns a reference to the allocation by which point the mutex has been released.

It's plausible that by the time the caller dereferences the returned reference another thread may have freed the given allocation.

At first I thought that this might be enough:

diff --git a/source/adapters/native_cpu/context.hpp b/source/adapters/native_cpu/context.hpp
index c59ab4ea..d3f54b3d 100644
--- a/source/adapters/native_cpu/context.hpp
+++ b/source/adapters/native_cpu/context.hpp
@@ -110,7 +110,7 @@ struct ur_context_handle_t_ : RefCounted {
   }

   // Note this is made non-const to access the mutex
-  const native_cpu::usm_alloc_info &get_alloc_info_entry(const void *ptr) {
+  const native_cpu::usm_alloc_info get_alloc_info_entry(const void *ptr) {
     std::lock_guard<std::mutex> lock(alloc_mutex);
     auto it = allocations.find(ptr);
     if (it == allocations.end()) {

However, returning a value type here doesn't actually help since although it'll prevent a segfault or similar UB, there's no way to determine whether the information returned is still valid.

I'm not sure whether it's something we see in the wild, but it's a real possibility so might be worth a different approach

ldrumm commented 1 month ago

Maybe this is a user-is-responsible problem and we don't care?