kokkos / kokkos-kernels

Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
Other
313 stars 98 forks source link

Prefer `expected == atomic_compare_exchange(ptr, expected, desired)` #2387

Closed dalg24 closed 1 month ago

dalg24 commented 1 month ago

Rewrote atomic_compare_exchange_strong(ptr, expected, desired) as expected == atomic_compare_exchange(ptr, expected, desired) as atomic_compare_exchange_strong is on its way to be deprecated/removed (see https://github.com/kokkos/kokkos/pull/7458)

dalg24 commented 1 month ago

If I understand correctly, atomic_compare_exchange_strong updates the expected value (it takes it by reference), whereas atomic_compare_exchange does not, so these re-writes are not equivalent.

No, it takes it by value https://github.com/kokkos/kokkos/blob/3fe17d96d42eb2bd84f1f2a835a2ed0ba87606b6/core/src/Kokkos_Atomics_Desul_Wrapper.hpp#L135-L139 (I know it is confusing. The desul counterpart does but I will propose to overhaul this later. It will be transparent to Kokkos.) The atomic_compare_exchange semantics are easier to reason about. They follow atomicCAS() form CUDA.

dalg24 commented 1 month ago

Here is a link to the latest release (before I started messing with the API) to convince yourself that this is right. It had been unchanged since 4.0 https://github.com/kokkos/kokkos/blob/28ccd1e72832a3bbe9393b7649c3798723bc328c/core/src/Kokkos_Atomics_Desul_Wrapper.hpp#L214-L219

ndellingwood commented 1 month ago

Lots of failing kokkos-kernels nightlies with e.g.

05:34:15 /home/jenkins/blake-new/workspace/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_Sycl_PV-oneMKL/kokkos-kernels/common/src/KokkosKernels_HashmapAccumulator.hpp:519:23: error: no member named 'atomic_compare_exchange_strong' in namespace 'Kokkos'
05:34:15           if (Kokkos::atomic_compare_exchange_strong(keys + hash, -1, key)) {

I'm assuming this PR will resolve them once ready for merge. Cross-referencing https://github.com/kokkos/kokkos/pull/7458

ndellingwood commented 1 month ago

Multiple jobs failing with timeout in the following tests:

     14 - graph_cuda (Timeout)
     16 - sparse_cuda (Timeout)

graph_cuda seems to be hanging in the test

 [ RUN      ] Cuda.graph_random_graph_coarsen_double_int_int_TestDevice

sparse_cuda seems to be hanging in the test

[ RUN      ] Cuda.sparse_spgemm_double_int_int_TestDevice
ndellingwood commented 1 month ago

@dalg24 can you add the dco signoff as well (check failed)? Sorry for the inconvenience