jrprice / Oclgrind

An OpenCL device simulator and debugger
Other
346 stars 61 forks source link

implement cl_khr_int64_base_atomics & cl_khr_int64_extended_atomics #127

Closed petrkalos closed 7 years ago

petrkalos commented 7 years ago

Preview implementation of atom_add. Before I implement all the methods, do you have any objections?

fixes #78

jrprice commented 7 years ago

The approach looks perfectly reasonable to me.

petrkalos commented 7 years ago

I'll try to push the rest of the methods later this week

petrkalos commented 7 years ago

Implementation of cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics is now in place.

I've decided to get rid of all the "almost the same" atomic_* implementations and use a templated generic atomic_op<typename T, AtomicOp op> and statically specialise for CmpXcgh, Min/Max, Inc/Dec.

What really remains is the a check for when the extensions are enabled with the "#pragma OPENCL EXTENSION cl_khr_int64_base/extended_atomics : enable"

jrprice commented 7 years ago

In general this seems fine, but this has broken the ability to use atom_* functions with 32-bit integer types - you seem to be assuming that the atom_* functions are only ever 64-bit (which isn't the case). Getting rid of the duplicated code in those atomic functions is nice though.

I'm not sure whether this can be achieved properly with the templated approach that you've opted for, but you could probably do essentially the same thing with a single (non-templated) atomic_op function that just checks the overload string for the type.

petrkalos commented 7 years ago

... but this has broken the ability to use atom* functions with 32-bit integer types - you seem to be assuming that the atom* functions are only ever 64-bit (which isn't the case).

I didn't know there are atom_* methods for 32-bit types, could you point me to these in the OpenCL spec?

I'm not sure whether this can be achieved properly with the templated approach that you've opted for, but you could probably do essentially the same thing with a single (non-templated) atomic_op function that just checks the overload string for the type.

I opted for the templated approach as it will produce static code, hence kind of faster and we also avoiding a lot of branching, but I guess we can create a map<string, AtomicOp> and use this. I don't mind re-implementing it like that.

jrprice commented 7 years ago

I didn't know there are atom_* methods for 32-bit types, could you point me to these in the OpenCL spec?

They come from the cl_khr_int32_base_atomics extension for OpenCL 1.0. Although they were renamed to atomic_* when they were made core in OpenCL 1.1, the atom_* versions are still required to be supported for backwards compatibility.

petrkalos commented 7 years ago

Ah apologise for my ignorance, never worker with OpenCL 1.0 I'll implement it in a different style to accommodate this requirement.

Thanks for the feedback

jrprice commented 7 years ago

Looks good now, thanks!