Closed jxy closed 1 year ago
Jenkins: Can one of the admins verify this patch?
What's the motivation for this?
This ensures that BlasArg
for caxpyxmazMR_
is private to each thread, even if device::max_kernel_arg_size()
is 0. It provides backend implementations freedom to choose different ways to pass kernel arguments without accidentally break the code.
I've changed int
to an enum
, and since it is a custom type, we need a few extra member functions in kernel_param
so that we can keep target_device.h
independent of kernel_helper.h
.
I also fixed HeavyQuarkResidualNorm_
and xpyHeavyQuarkResidualNorm_
, which received the same treatment as caxpyxmazMR_
now.
I think we should make the enum
a enum class
which is more error proofing. Also personally I am fine with the it but I remember the enum
's in QUDA needs to all be upper case?
Yes, for consistency with QUDA, I would strongly prefer all caps for the enum. Also, instead of yes / no / always, I would prefer TRUE / FALSE / ALWAYS
Apologies if I was unclear @jxy. What I meant for the enum names was to use USE_KERNEL_ARG_FALSE
, USE_KERNEL_ARG_TRUE
, USE_KERNEL_ARG_ALWAYS
.
This is enum class, now. The names are properly scoped. You actually can't directly use FALSE
/TRUE
/ALWAYS
as it is without the enum class name and double colon, unless we move to c++20 and use using enum
declaration.
That's a good point, that it's scoped so the prefix is unnecessary.
Latest discussion on portability call is that we should be able to delete the static_asserts
in blas_core.cuh and reduce_core.cuh which should be redundant given the static_asserts
in the target launcher.
@jcosborn to review this PR.
Removed static_assert
as we discussed. I left comments in the constructors to remind ourselves in the future.
test this please
@Jenkins test this please
We first make sure when
BlasArg
is instantiated, the argument for functorcaxpyxmazMR_
is truly passed as a kernel argument. Then we makeuse_kernel_arg
an int, and introduce it toBlasFunctor
andcaxpyxmazMR_
, and letdevice::use_kernel_arg
know that we really wantuse_kernel_arg
if its value is more than 1.