llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.48k stars 11.77k forks source link

clang misoptimization cached equality of thread local variable #111257

Open kelbon opened 1 week ago

kelbon commented 1 week ago

commit where clang 18/19 fails: https://github.com/kelbon/kelcoro/pull/28/commits/541a84b207c62c19dbdbc5b35f4788f36cec7b19

code:


  auto job_creator = [&](std::atomic<int32_t>& value) -> dd::job {
    auto th_id = std::this_thread::get_id();
    nocache(th_id);
    (void)co_await dd::jump_on(TP);
    if (th_id == std::this_thread::get_id())
      ++err_c;
    value.fetch_add(1, std::memory_order::release);
    if (value.load(std::memory_order::acquire) == 10)
      value.notify_one();
  };

TP - thread pool

My workaround (after this test works as expected):

[[gnu::noinline]] void nocache(auto&) {
}
auto job_creator = [&](std::atomic<int32_t>& value) -> dd::job {
    auto th_id = std::this_thread::get_id();
    nocache(th_id);
    (void)co_await dd::jump_on(TP);
    if (th_id == std::this_thread::get_id())
      ++err_c;
    value.fetch_add(1, std::memory_order::release);
    if (value.load(std::memory_order::acquire) == 10)
      value.notify_one();
};

So, it is not first issue with caching thread local variables in coroutines, but now i think clang does not cache value of variable (load address), but anyway optimization thinks, that 'th_id' is equal to result of get_id

llvmbot commented 1 week ago

@llvm/issue-subscribers-coroutines

Author: None (kelbon)

commit where clang 18/19 fails: https://github.com/kelbon/kelcoro/pull/28/commits/541a84b207c62c19dbdbc5b35f4788f36cec7b19 code: ```cpp auto job_creator = [&](std::atomic<int32_t>& value) -> dd::job { auto th_id = std::this_thread::get_id(); nocache(th_id); (void)co_await dd::jump_on(TP); if (th_id == std::this_thread::get_id()) ++err_c; value.fetch_add(1, std::memory_order::release); if (value.load(std::memory_order::acquire) == 10) value.notify_one(); }; ``` TP - thread pool My workaround (after this test works as expected): ```cpp [[gnu::noinline]] void nocache(auto&) { } auto job_creator = [&](std::atomic<int32_t>& value) -> dd::job { auto th_id = std::this_thread::get_id(); nocache(th_id); (void)co_await dd::jump_on(TP); if (th_id == std::this_thread::get_id()) ++err_c; value.fetch_add(1, std::memory_order::release); if (value.load(std::memory_order::acquire) == 10) value.notify_one(); }; ``` So, it is not first issue with caching thread local variables in coroutines, but now i think clang does not cache value of variable (load address), but anyway optimization thinks, that 'th_id' is equal to result of get_id
ChuanqiXu9 commented 6 days ago

Actually we never fix the issue foundamentally enough. We just banned merging TLS in coroutines in different optimizations one by one...

kelbon commented 6 days ago

Actually we never fix the issue foundamentally enough. We just banned merging TLS in coroutines in different optimizations one by one...

Isn't this a bigger problem? May clang cache value of just global between co awaits? Or value of member in class ?

ChuanqiXu9 commented 6 days ago

Yeah, that is a big problem.

BTW, this has nothing to do with clang but LLVM optimizations.

kelbon commented 6 days ago

Yeah, that is a big problem.

BTW, this has nothing to do with clang but LLVM optimizations.

but why? If noinline function can prevent optimizations, why coroutine suspend cannot?

ChuanqiXu9 commented 6 days ago

I feel the above problem can't be prevent by inserting noinline functions.

May clang cache value of just global between co awaits? Or value of member in class ?

Sorry to ignore this. It is not about globals nor member in classes, but some (middle end) functions are marked with pure attribute incorrectly. If you're interested, you can look at this https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015/60

kelbon commented 2 days ago

I feel the above problem can't be prevent by inserting noinline functions.

May clang cache value of just global between co awaits? Or value of member in class ?

Sorry to ignore this. It is not about globals nor member in classes, but some (middle end) functions are marked with pure attribute incorrectly. If you're interested, you can look at this https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015/60

Not linked witih issue, just a question, do clang have optimization for awaiters which are not really awaiters?

Typical example is


  struct get_handle {
    std::coroutine_handle<> handle;

    static bool await_ready() noexcept {
      return false;
    }
    bool await_suspend(std::coroutine_handle<> h) noexcept {
      handle = h;
      return false;
    }
    std::coroutine_handle<> await_resume() noexcept {
      return handle;
    }
  };

Its something standard forget to add, always false await_suspend or co_this

I have many of such awaiters in my coroutines, if it is not trivial to detect, may be add [[clang::never_suspend]] ?