kokkos / stdBLAS

Reference Implementation for stdBLAS
Other
118 stars 22 forks source link

Draft: Attempt to specialize matrix_vector_product for parallel_policy #255

Open mhoemmen opened 1 year ago

mhoemmen commented 1 year ago

@crtrott @dalg24

@amklinv-nnl has been working on parallel specialization of stdblas algorithms. The two of us tried to specialize matrix_vector_product for std::execution::parallel_policy, but keep getting run-time recursion. We're guessing that the compiler thinks the generic ExecutionPolicy&& overload is "more specialized."

The test lives in tests/native/gemv_no_ambig.cpp. We're building using the following CMake options:

-DLINALG_ENABLE_TESTS=ON -DLINALG_ENABLE_EXAMPLES=ON -DLINALG_ENABLE_TBB=ON -DTBB_DIR=<PATH_TO_TBB_INSTALLATION>

Please don't merge this branch, btw; it will almost certainly conflict with other PRs.

mhoemmen commented 1 year ago

FYI, TBB can be built and installed from scratch using the following repo: https://github.com/oneapi-src/oneTBB . @amklinv-nnl tested with GCC 13 and it still requires TBB for parallel algorithms to compile, alas.

mhoemmen commented 1 year ago

@amklinv-nnl Christian Trott explained offline how specializations for different policies work.

  1. Don't try to specialize *_is_avail.
  2. Only write specializations for an internal execution policy. Never write specializations for any of the Standard policies.
  3. If needed, overload execpolicy_mapper to map from std::execution::parallel_policy to a built-in policy (e.g., impl::some_happy_parallel_policy). Then, overload matrix_vector_product to take impl::some_happy_parallel_policy.

I don't have time to try this at the moment, but it sounds like this should fix the recursion issue (because users wouldn't ever pass in one of the internal execution policies).