llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.24k stars 12.07k forks source link

[C++4OpenCL] Problem automatically deducing reference template parameters with address spaces #49506

Open c22aa9b3-00a4-45ea-80bc-0d35b235c840 opened 3 years ago

c22aa9b3-00a4-45ea-80bc-0d35b235c840 commented 3 years ago
Bugzilla Link 50162
Version trunk
OS Linux
Attachments testcase
CC @AnastasiaStulova

Extended Description

I am not sure whether this is a real bug in the sense of the OpenCL specification, it resembles a bit what was discussed here: llvm/llvm-project#41378

My problem is that C++ for OpenCL fails to deduce the template parameters in many cases, when address spaces get involved.

Note: the following test case requires https://reviews.llvm.org/D101168. I have been testing with clang trunk (clang version 13.0.0 (https://github.com/llvm/llvm-project.git 837fded984ed36fa462daeb0f671eec58f71ae26)) + https://reviews.llvm.org/D101168 and with the following command line: /home/qon/alice/llvm-project/build/bin/clang++ -O0 -cl-std=clc++ -x cl -emit-llvm --target=spir64-unknown-unknown -Xclang -fdenormal-fp-math-f32=ieee -cl-mad-enable -cl-no-signed-zeros -ferror-limit=1000 -Dcl_clang_storage_class_specifiers -c fail.cl -o test.bc

The error I am getting is:

In file included from ../Base/opencl-common/GPUReconstructionOCL.cl:80: In file included from ../Base/GPUReconstructionIncludesDevice.h:104: ../Refit/GPUTrackingRefit.cxx:101:7: error: no viable overloaded '=' trk = trkX;


../Refit/GPUTrackingRefit.cxx:215:3: note: in instantiation of function template specialization 'o2::gpu::GPUTrackingRefit::convertTrack<__private o2::track::TrackParametrizationWithError<>, __generic o2::gpu::GPUTPCGMMergedTrack, const __generic o2::base::PropagatorImpl<float> *__private>' requested here
  convertTrack(trk, trkX, prop, &TrackParCovChi2);
  ^
/home/qon/alice/O2/DataFormats/Reconstruction/include/ReconstructionDataFormats/TrackParametrizationWithError.h:50:48: note: candidate function not viable: no known conversion from 'const __generic o2::gpu::GPUTPCGMMergedTrack' to 'const __generic o2::track::TrackParametrizationWithError<>' for 1st argument
                TrackParametrizationWithError& operator=(const TrackParametrizationWithError& src) = default;
                                               ^
/home/qon/alice/O2/DataFormats/Reconstruction/include/ReconstructionDataFormats/TrackParametrizationWithError.h:51:48: note: candidate function not viable: no known conversion from 'const __generic o2::gpu::GPUTPCGMMergedTrack' to '__generic o2::track::TrackParametrizationWithError<>' for 1st argument
                TrackParametrizationWithError& operator=(TrackParametrizationWithError&& src) = default;
                                               ^
In file included from ../Base/opencl-common/GPUReconstructionOCL.cl:80:
In file included from ../Base/GPUReconstructionIncludesDevice.h:104:

I think the problem is that it deduces the template parameters as "__generic TYPE&" etc, and that will not work in all cases, particular if references to non generic variables are passed in. But then, if I explicitly state the template parameters, as I do in workaround.patch, it works. What I am doing is just ommiting the address space completely, so it will now work with all of them (except __constant).
Another workaround is to define the template function in the way
template <class T> void foo(__generic T& bar) ...
but that has the problem that now it only works with the __generic address space, so it breaks when one passes in local variables.

Overall, this situation makes templates with references / pointers pretty much unusable. There are workarounds in our code all over the place, so I am wondering whether one could improve the situation in general. I think the problem is that the address space is part of the template type. Not sure what could be a proper solution, but frankly I do not understand why the address space must become a part of the template type at all, why cannot this be stripped?
AnastasiaStulova commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#50259

AnastasiaStulova commented 3 years ago

is not yet going to work in C++4OpenCL, if I understand correctly?

Correct, this is not yet enabled for OpenCL. But we can certainly look into this hopefully soon. This can be exposed as a Clang specific feature for now.

One point though: We have a generic code that we want to compile also with CUDA and with HIP. So as long as we cannot use this transparently, it is of little use to us. Probably I could hack something together using the preprocessor, but that is not so nice.

If you don't need this on the CUDA side then you could add a macro indeed that would be empty for CUDA but has the templating address space trick for OpenCL.

And generally speaking, the required treatment of the address spaces, and the fact that constant cannot be passed to generic, is hampering our development again and again. And it is always only OpenCL causing such trouble. With HIP and CUDA we do not have these issues.

Interesting, does CUDA still protect from incorrect writes into the constant memory or is it left to the application developers?

In general, we are aware that CUDA has a different address space mode to OpenCL. In OpenCL we follow more the semantics from embedded C where the address spaces are part of the type qualifier and the generic model has been added to OpenCL only in version 2.0 and it is currently not supported by all vendors. So in OpenCL 3.0 generic address space has been made optional. So in short OpenCL's phylosophy is to cover wider range of architectures but as a result we can't rely on certain features that simplify support for some devices even if it is probably currported on many GPUs.

We can certainly look at creating a separate profile or add more extensions for more capable devices in the future and your feedback is very valuable for us to drive the development.

c22aa9b3-00a4-45ea-80bc-0d35b235c840 commented 3 years ago

So this example

template<int A, class T> void convert(attribute((address_space(A))) T& p) { foo(); }

is not yet going to work in C++4OpenCL, if I understand correctly? I think it would really make sense to make that available. That is basically exactly what is needed. I agree it will blow up the binary size, but that should be ok (at least for us), it will also allow to optimize the code better since each address space that is used can be treated in the best way.

One point though: We have a generic code that we want to compile also with CUDA and with HIP. So as long as we cannot use this transparently, it is of little use to us. Probably I could hack something together using the preprocessor, but that is not so nice.

And generally speaking, the required treatment of the address spaces, and the fact that constant cannot be passed to generic, is hampering our development again and again. And it is always only OpenCL causing such trouble. With HIP and CUDA we do not have these issues. And since OpenCL is basically optional for our operation, there are voices coming up asking why we keep the OpenCL support if it complicates the code.

Now obviously that cannot be solved here and yet, and I see that the language is specified that way. But in the long run, I think this should be improved to make C++ with templates usable in OpenCL. Personally, I would even consider a flag or an extension, which would always instantiate all templates with pointers or references as parameter for all address spaces automatically. This could hopefully work on a on-first-use basis. That would perhaps not be the ideal way on the OpenCL side, but it would allow to keep the code generic.

AnastasiaStulova commented 3 years ago

FYI since there is a lack of documentation for this feature - for the further clarifications, the following:

template<int A, class T> void convert(attribute((address_space(A))) T& p) { foo(); }

means 'convert' takes argument of any reference type in any address space. This will apply to all address spaces including 'generic' or 'constant'.

AnastasiaStulova commented 3 years ago

Another alternative that could help here is templating on address space value that is enabled in C++ mode but not in OpenCL yet.

Example:

extern void foo(); extern void bar();

template void convert(T& p) { foo(); }

template void convert(attribute((address_space(I))) int& p) { bar(); }

attribute((address_space(1))) int i; attribute((address_space(1))) float f;

void test(){ convert(i); // calls convert with bar() convert(f); // calls convert with foo() }

https://godbolt.org/z/a9Mrhn9zx

The disadvantage of this is that a new template will be instantiated for every distinct address space used in the argument when convert is called resulting in a slightly larger binary size but it could simplify the code.

We can look at enabling this feature for OpenCL if it turns out to be useful...

AnastasiaStulova commented 3 years ago

FYI I just spotted that my previous example is not entirely correct as it would require remove_address_space from llvm/llvm-bugzilla-archive#45326 in std::is_same to work properly. But the idea is the same.

So here is the correct example:

https://godbolt.org/z/vr7xr6P5o

AnastasiaStulova commented 3 years ago

One alternative workaround that can be used in compiled sources is to constrain the template function using enable_if utility from the type traits to prevent the undesirable instantiations of the template and then use normal function overloading instead.

Here is an example:

pragma OPENCL EXTENSION __cl_clang_function_pointers : enable

pragma OPENCL EXTENSION __cl_clang_variadic_functions : enable

include

pragma OPENCL EXTENSION __cl_clang_function_pointers : disable

pragma OPENCL EXTENSION __cl_clang_variadic_functions : disable

extern void foo(); extern void bar();

template<class T, typename = typename std::enable_if< std::is_same<T, int>::value >::type> void convert(T& p) { foo(); }

void convert(int& p) { bar(); }

void test(){ int i; convert(i); // this will calls convert with bar() }

https://godbolt.org/z/bnj7jrG6c

However this means you would need to come up with an argument of enable_if that suits your constraints. This might be tricky in some situations but it is something that might work better than specifying the template arguments everywhere?

AnastasiaStulova commented 3 years ago

I looked at this issue and it seems that the problem occur due to the fact that C++ doesn't consider implicit conversions when selecting the template specializations. So the equivalent example in ISO C++ would be as follows:

extern void foo(); extern void bar();

template void convert(T& p) { foo(); }

template<> void convert(const int& p) { bar(); }

void test(){ int i = 0; convert(i); // this will call convert with foo() }

https://godbolt.org/z/1M9TaGhEb

In your example the same occurs for the following specializations:

template <> void GPUTrackingRefit::convertTrack<GPUTPCGMMergedTrack, TrackParCov, const Propagator>(GPUTPCGMMergedTrack& trk, const TrackParCov& trkX, const Propagator& pr op, float* chi2)

template <> void GPUTrackingRefit::convertTrack<GPUTPCGMTrackParam, GPUTPCGMMergedTrack, GPUTPCGMPropagator>(GPUTPCGMTrackParam& trk, const GPUTPCGMMergedTrack& trkX, GPUTPC GMPropagator& prop, float* chi2)

We can't avoid deducing the address space in the specialized parameters since they are not going to be instantiated and we will end up with no address space which is wrong. Changeing the C++ rules to consider the implicit conversions for the specializations doesn't seem trivial.

At the same time it seems that even when the address space match with the specializtion function parameters, it is still not selected unless the address space is added in the parameters of specialization itself:

extern void foo(); extern void bar();

template void convert(T& p) { foo(); }

ifdef FOO

template<> void convert(int& p) { bar(); }

void test(int& ii){ convert(ii); // this will call convert with foo() }

else

template<> void convert(int& p) { bar(); }

void test(int& ii){ convert(ii); // this will call convert with bar() }

endif

So 'T' and 'generic T' in parameters of template specializations alter the specialization selected by the compiler automatically. It is very confusing.

I can't think of any quick solution right now but I still want to give some time to explore the alternatives. It might be that we will need some sort of templating based on the address spaces and there was a C++ proposal to do this for the qualifiers in the future standards. However, this might be taking long time and it would be good to evaluate other options too.