tud-zih-energy / lo2s

Linux OTF2 Sampling - A Lightweight Node-Level Performance Monitoring Tool
https://tu-dresden.de/zih/forschung/projekte/lo2s?set_language=en
GNU General Public License v3.0
45 stars 13 forks source link

Revisit memory ordering when accessing the ring buffer #287

Closed tilsche closed 1 year ago

tilsche commented 1 year ago

Currently, we use

        std::atomic_thread_fence(std::memory_order_acq_rel);

For all head/tail accesses.

cvonelm commented 1 year ago

That seems wrong, the write access should probably be guarded differently than the read access. That is the case in linux/tools/include/linux/ring_buffer.h, and perf the perf_event.h docstring.

According to the standard, the above statement is overtly conservative, but the closest the standard has to Linux' rmb()/wmb(). That being said, I would like to replace it with std::atomic<T>.store() and std::atomic<T>.load(), which seem to be less wrong than the above and make use of atomic store/load instructions on certain processors, such as ARM64.

This actually compiles away completely on x86-64 with optimization. Which also seems wrong.

This is the expected behaviour according to the standard:

On strongly-ordered systems — x86, SPARC TSO, IBM mainframe, etc. — release-acquire ordering is automatic for the majority of operations. No additional CPU instructions are issued for this synchronization mode; only certain compiler optimizations are affected (e.g., the compiler is prohibited from moving non-atomic stores past the atomic store-release or performing non-atomic loads earlier than the atomic load-acquire).

https://en.cppreference.com/w/cpp/atomic/memory_order

bertwesarg commented 1 year ago

Just in case this comes up again, or needs to be revisited in the future: https://gcc.gnu.org/pipermail/gcc/2023-July/241952.html (not read yet)