Closed petrkalos closed 7 years ago
The approach looks perfectly reasonable to me.
I'll try to push the rest of the methods later this week
Implementation of cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics is now in place.
I've decided to get rid of all the "almost the same" atomic_* implementations and use a templated generic atomic_op<typename T, AtomicOp op> and statically specialise for CmpXcgh, Min/Max, Inc/Dec.
What really remains is the a check for when the extensions are enabled with the "#pragma OPENCL EXTENSION cl_khr_int64_base/extended_atomics : enable"
In general this seems fine, but this has broken the ability to use atom_*
functions with 32-bit integer types - you seem to be assuming that the atom_*
functions are only ever 64-bit (which isn't the case). Getting rid of the duplicated code in those atomic functions is nice though.
I'm not sure whether this can be achieved properly with the templated approach that you've opted for, but you could probably do essentially the same thing with a single (non-templated) atomic_op
function that just checks the overload string for the type.
... but this has broken the ability to use atom* functions with 32-bit integer types - you seem to be assuming that the atom* functions are only ever 64-bit (which isn't the case).
I didn't know there are atom_* methods for 32-bit types, could you point me to these in the OpenCL spec?
I'm not sure whether this can be achieved properly with the templated approach that you've opted for, but you could probably do essentially the same thing with a single (non-templated) atomic_op function that just checks the overload string for the type.
I opted for the templated approach as it will produce static code, hence kind of faster and we also avoiding a lot of branching, but I guess we can create a map<string, AtomicOp> and use this. I don't mind re-implementing it like that.
I didn't know there are atom_* methods for 32-bit types, could you point me to these in the OpenCL spec?
They come from the cl_khr_int32_base_atomics
extension for OpenCL 1.0. Although they were renamed to atomic_*
when they were made core in OpenCL 1.1, the atom_*
versions are still required to be supported for backwards compatibility.
Ah apologise for my ignorance, never worker with OpenCL 1.0 I'll implement it in a different style to accommodate this requirement.
Thanks for the feedback
Looks good now, thanks!
Preview implementation of atom_add. Before I implement all the methods, do you have any objections?
fixes #78