Open c22aa9b3-00a4-45ea-80bc-0d35b235c840 opened 3 years ago
mentioned in issue llvm/llvm-bugzilla-archive#50259
is not yet going to work in C++4OpenCL, if I understand correctly?
Correct, this is not yet enabled for OpenCL. But we can certainly look into this hopefully soon. This can be exposed as a Clang specific feature for now.
One point though: We have a generic code that we want to compile also with CUDA and with HIP. So as long as we cannot use this transparently, it is of little use to us. Probably I could hack something together using the preprocessor, but that is not so nice.
If you don't need this on the CUDA side then you could add a macro indeed that would be empty for CUDA but has the templating address space trick for OpenCL.
And generally speaking, the required treatment of the address spaces, and the fact that constant cannot be passed to generic, is hampering our development again and again. And it is always only OpenCL causing such trouble. With HIP and CUDA we do not have these issues.
Interesting, does CUDA still protect from incorrect writes into the constant memory or is it left to the application developers?
In general, we are aware that CUDA has a different address space mode to OpenCL. In OpenCL we follow more the semantics from embedded C where the address spaces are part of the type qualifier and the generic model has been added to OpenCL only in version 2.0 and it is currently not supported by all vendors. So in OpenCL 3.0 generic address space has been made optional. So in short OpenCL's phylosophy is to cover wider range of architectures but as a result we can't rely on certain features that simplify support for some devices even if it is probably currported on many GPUs.
We can certainly look at creating a separate profile or add more extensions for more capable devices in the future and your feedback is very valuable for us to drive the development.
So this example
template<int A, class T> void convert(attribute((address_space(A))) T& p) { foo(); }
is not yet going to work in C++4OpenCL, if I understand correctly? I think it would really make sense to make that available. That is basically exactly what is needed. I agree it will blow up the binary size, but that should be ok (at least for us), it will also allow to optimize the code better since each address space that is used can be treated in the best way.
One point though: We have a generic code that we want to compile also with CUDA and with HIP. So as long as we cannot use this transparently, it is of little use to us. Probably I could hack something together using the preprocessor, but that is not so nice.
And generally speaking, the required treatment of the address spaces, and the fact that constant cannot be passed to generic, is hampering our development again and again. And it is always only OpenCL causing such trouble. With HIP and CUDA we do not have these issues. And since OpenCL is basically optional for our operation, there are voices coming up asking why we keep the OpenCL support if it complicates the code.
Now obviously that cannot be solved here and yet, and I see that the language is specified that way. But in the long run, I think this should be improved to make C++ with templates usable in OpenCL. Personally, I would even consider a flag or an extension, which would always instantiate all templates with pointers or references as parameter for all address spaces automatically. This could hopefully work on a on-first-use basis. That would perhaps not be the ideal way on the OpenCL side, but it would allow to keep the code generic.
FYI since there is a lack of documentation for this feature - for the further clarifications, the following:
template<int A, class T> void convert(attribute((address_space(A))) T& p) { foo(); }
means 'convert' takes argument of any reference type in any address space. This will apply to all address spaces including 'generic' or 'constant'.
Another alternative that could help here is templating on address space value that is enabled in C++ mode but not in OpenCL yet.
Example:
extern void foo(); extern void bar();
template
template
attribute((address_space(1))) int i; attribute((address_space(1))) float f;
void test(){ convert(i); // calls convert with bar() convert(f); // calls convert with foo() }
https://godbolt.org/z/a9Mrhn9zx
The disadvantage of this is that a new template will be instantiated for every distinct address space used in the argument when convert is called resulting in a slightly larger binary size but it could simplify the code.
We can look at enabling this feature for OpenCL if it turns out to be useful...
FYI I just spotted that my previous example is not entirely correct as it would require remove_address_space from llvm/llvm-bugzilla-archive#45326 in std::is_same to work properly. But the idea is the same.
So here is the correct example:
One alternative workaround that can be used in compiled sources is to constrain the template function using enable_if utility from the type traits to prevent the undesirable instantiations of the template and then use normal function overloading instead.
Here is an example:
extern void foo(); extern void bar();
template<class T, typename = typename std::enable_if< std::is_same<T, int>::value >::type> void convert(T& p) { foo(); }
void convert(int& p) { bar(); }
void test(){ int i; convert(i); // this will calls convert with bar() }
https://godbolt.org/z/bnj7jrG6c
However this means you would need to come up with an argument of enable_if that suits your constraints. This might be tricky in some situations but it is something that might work better than specifying the template arguments everywhere?
I looked at this issue and it seems that the problem occur due to the fact that C++ doesn't consider implicit conversions when selecting the template specializations. So the equivalent example in ISO C++ would be as follows:
extern void foo(); extern void bar();
template
template<>
void convert
void test(){ int i = 0; convert(i); // this will call convert with foo() }
https://godbolt.org/z/1M9TaGhEb
In your example the same occurs for the following specializations:
template <> void GPUTrackingRefit::convertTrack<GPUTPCGMMergedTrack, TrackParCov, const Propagator>(GPUTPCGMMergedTrack& trk, const TrackParCov& trkX, const Propagator& pr op, float* chi2)
template <> void GPUTrackingRefit::convertTrack<GPUTPCGMTrackParam, GPUTPCGMMergedTrack, GPUTPCGMPropagator>(GPUTPCGMTrackParam& trk, const GPUTPCGMMergedTrack& trkX, GPUTPC GMPropagator& prop, float* chi2)
We can't avoid deducing the address space in the specialized parameters since they are not going to be instantiated and we will end up with no address space which is wrong. Changeing the C++ rules to consider the implicit conversions for the specializations doesn't seem trivial.
At the same time it seems that even when the address space match with the specializtion function parameters, it is still not selected unless the address space is added in the parameters of specialization itself:
extern void foo(); extern void bar();
template
template<>
void convert
void test(int& ii){ convert(ii); // this will call convert with foo() }
template<>
void convert
void test(int& ii){ convert(ii); // this will call convert with bar() }
So 'T' and 'generic T' in parameters of template specializations alter the specialization selected by the compiler automatically. It is very confusing.
I can't think of any quick solution right now but I still want to give some time to explore the alternatives. It might be that we will need some sort of templating based on the address spaces and there was a C++ proposal to do this for the qualifiers in the future standards. However, this might be taking long time and it would be good to evaluate other options too.
Extended Description
I am not sure whether this is a real bug in the sense of the OpenCL specification, it resembles a bit what was discussed here: llvm/llvm-project#41378
My problem is that C++ for OpenCL fails to deduce the template parameters in many cases, when address spaces get involved.
Note: the following test case requires https://reviews.llvm.org/D101168. I have been testing with clang trunk (clang version 13.0.0 (https://github.com/llvm/llvm-project.git 837fded984ed36fa462daeb0f671eec58f71ae26)) + https://reviews.llvm.org/D101168 and with the following command line: /home/qon/alice/llvm-project/build/bin/clang++ -O0 -cl-std=clc++ -x cl -emit-llvm --target=spir64-unknown-unknown -Xclang -fdenormal-fp-math-f32=ieee -cl-mad-enable -cl-no-signed-zeros -ferror-limit=1000 -Dcl_clang_storage_class_specifiers -c fail.cl -o test.bc
The error I am getting is:
In file included from ../Base/opencl-common/GPUReconstructionOCL.cl:80: In file included from ../Base/GPUReconstructionIncludesDevice.h:104: ../Refit/GPUTrackingRefit.cxx:101:7: error: no viable overloaded '=' trk = trkX;