oneapi-src / oneMKL

oneAPI Math Kernel Library (oneMKL) Interfaces
Apache License 2.0
609 stars 157 forks source link

[lapack][blas][cuda] Update host task impl to use enqueue_native_command #572

Open JackAKirk opened 1 week ago

JackAKirk commented 1 week ago

Description

Update host task impl to use enqueue_native_command for blas/lapack using the cuda backend (cublas/cusolver). I did both backends in a single PR because the cusolver backend uses the cublas backend of oneMKL.

The sycl_ext_codeplay_enqueue_native_command extension reduces latency wrt the host_task for native library submissions, and allows integration with sycl task_graph / events. See https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_codeplay_enqueue_native_command.asciidoc for details.

This extension has already been shown to lead to considerable performance improvements for applications that call oneMKL, such as Gromacs for the oneMKL fft backend. We expect similar improvements for the lapack and blas backends implemented here.

I had to update the lapack tests because they previously relied on the synchronous behaviour of the native calls due to the fact we had to sync the native streams, since previously with standard host_task we are not able to integrate the native event into the sycl task_graph/ sycl::event. I did not need to update the blas tests since they already take into account asynchronous behaviour.

Checklist

https://github.com/oneapi-src/oneMKL/issues/216 is for the most part fixed, but technically this PR maximally enables ooo queue interoperability so we can say that this fixes https://github.com/oneapi-src/oneMKL/issues/216

All Submissions

I've added a test for each backend for each of the possible codepaths:

test_main_blas_ct_host_task.txt test_main_blas_rt_host_task.txt test_main_lapack_rt_native_command.txt test_main_lapack_ct_native_command.txt test_main_lapack_ct_host_task.txt test_main_lapack_rt_host_task.txt test_main_blas_ct_res_native_command.txt test_main_blas_rt_res_native_command.txt

JackAKirk commented 1 week ago

@hdelan please review this when you are back.

JackAKirk commented 1 week ago

I have a small patch ready to update cublas backend just a little to implement missing GEMV_BATCH. This I think puts cublas backend to a status where everything that maps directly between oneMKL and cublas APIs is supported to some degree (some types remain unimplemented, such as bfloat16/some mixed precisions already identified in the issues board etc). Is it OK for me to add it here, to save the PR review overhead? @Rbiessy what do you think?

Rbiessy commented 1 week ago

I would prefer to have a separate PR to make it clearer which commit implements what.

JackAKirk commented 1 week ago

I would prefer to have a separate PR to make it clearer which commit implements what.

Yeah OK, I'll be patient, thanks.

JackAKirk commented 6 days ago

Hi @oneapi-src/onemkl-blas-write @oneapi-src/onemkl-lapack-write would it be possible for you to review this?

Thanks