lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
289 stars 97 forks source link

Use kernel arg #1304

Closed jxy closed 1 year ago

jxy commented 2 years ago

We first make sure when BlasArg is instantiated, the argument for functor caxpyxmazMR_ is truly passed as a kernel argument. Then we make use_kernel_arg an int, and introduce it to BlasFunctor and caxpyxmazMR_, and let device::use_kernel_arg know that we really want use_kernel_arg if its value is more than 1.

mathiaswagner commented 2 years ago

Jenkins: Can one of the admins verify this patch?

mathiaswagner commented 2 years ago

What's the motivation for this?

jxy commented 2 years ago

This ensures that BlasArg for caxpyxmazMR_ is private to each thread, even if device::max_kernel_arg_size() is 0. It provides backend implementations freedom to choose different ways to pass kernel arguments without accidentally break the code.

jxy commented 2 years ago

I've changed int to an enum, and since it is a custom type, we need a few extra member functions in kernel_param so that we can keep target_device.h independent of kernel_helper.h.

I also fixed HeavyQuarkResidualNorm_ and xpyHeavyQuarkResidualNorm_, which received the same treatment as caxpyxmazMR_ now.

hummingtree commented 2 years ago

I think we should make the enum a enum class which is more error proofing. Also personally I am fine with the it but I remember the enum's in QUDA needs to all be upper case?

maddyscientist commented 2 years ago

Yes, for consistency with QUDA, I would strongly prefer all caps for the enum. Also, instead of yes / no / always, I would prefer TRUE / FALSE / ALWAYS

maddyscientist commented 2 years ago

Apologies if I was unclear @jxy. What I meant for the enum names was to use USE_KERNEL_ARG_FALSE, USE_KERNEL_ARG_TRUE, USE_KERNEL_ARG_ALWAYS.

jxy commented 2 years ago

This is enum class, now. The names are properly scoped. You actually can't directly use FALSE/TRUE/ALWAYS as it is without the enum class name and double colon, unless we move to c++20 and use using enum declaration.

maddyscientist commented 2 years ago

That's a good point, that it's scoped so the prefix is unnecessary.

maddyscientist commented 2 years ago

Latest discussion on portability call is that we should be able to delete the static_asserts in blas_core.cuh and reduce_core.cuh which should be redundant given the static_asserts in the target launcher.

@jcosborn to review this PR.

jxy commented 1 year ago

Removed static_assert as we discussed. I left comments in the constructors to remind ourselves in the future.

maddyscientist commented 1 year ago

test this please

mathiaswagner commented 1 year ago

@Jenkins test this please