Use of undeclared identifier 'surf2Dwrite'

llvmbot commented 8 years ago


Bugzilla Link	28919
Version	trunk
OS	Linux
Attachments	File containing the CUDA kernel, Include `surface_indirect_functions.h` from `__clang_cuda_runtime_wrapper.h`, clang generated PTX file
Reporter	LLVM Bugzilla Contributor
CC	@jlebar,@Artem-B

Extended Description

Hello,

I wanted to give clang a go on compiling my CUDA kernels. I was able to get the example at bit.ly/llvm-cuda to compile and run properly, but when I tried the attached code, I got the error message "error: use of undeclared identifier 'surf2Dwrite'".

Versions used:

CUDA: 7.5
clang: 3.8 and 63240447ac6e ("[analyzer] Try to fix coverity CID 1360469.")
llvm: 3.8 and 04876e5fe54e ("[X86][XOP] Add support for combining target shuffles to VPERMIL2PD/VPERMIL2PS")

The compilation command as well as the output look as follow (with both previously mentioned versions of clang+LLVM):

% clang++ -std=c++11 --cuda-path=/opt/cuda --cuda-gpu-arch=sm_30 -c surface.cu
surface.cu:15:3: error: use of undeclared identifier 'surf2Dwrite'
  surf2Dwrite(make_float4(0.0f, 0.0f, 0.0f, 1.0f), result_surface, index.x * sizeof(float4), index.y);
  ^
1 error generated

If I include directly surface_indirect_functions.h which defines surf2Dwrite, then the program does compile (well, sort of depending on the clang version, but that looks like a different bug, so different bug report). Playing with the CUDA headers, I added some #error "foo" inside device_functions.h which is included by cuda_runtime.h, and in turns includes surface_indirect_functions.h: so one right after the #define __DEVICE_FUNCTIONS_H__, and another one (with a different message) in an else clause to the #ifdef __DEVICE_FUNCTIONS_H__. To resume, this is how the modified device_functions.h looks like:

#ifndef __DEVICE_FUNCTIONS_H__
#define __DEVICE_FUNCTIONS_H__
#error "foo"

// [snip] untouched content

#else
#error "bar"
#endif // __DEVICE_FUNCTIONS_H__

And here is the corresponding compilation output with clang:

% clang++ -std=c++11 --cuda-path=/opt/cuda --cuda-gpu-arch=sm_30 -c surface.cu
In file included from <built-in>:1:
In file included from /home/pmoreau/privat/userspace/llvm/bin/../lib/clang/4.0.0/include/__clang_cuda_runtime_wrapper.h:98:
In file included from /opt/cuda/include/cuda_runtime.h:115:
/opt/cuda/include/device_functions.h:4247:2: error: "bar"
#error "bar"
 ^
In file included from <built-in>:1:
In file included from /home/pmoreau/privat/userspace/llvm/bin/../lib/clang/4.0.0/include/__clang_cuda_runtime_wrapper.h:156:
In file included from /opt/cuda/include/math_functions.hpp:1065:
/opt/cuda/include/device_functions.h:4247:2: error: "bar"
#error "bar"
 ^
2 errors generated.

Apparently __DEVICE_FUNCTIONS_H__ is getting defined by another file since we do not get an error: "foo".

Thank you in advance! Pierre

Artem-B commented 8 years ago

Textures and surfaces are in roughly the same state. There's (some?) underlying support in LLVM, but we're missing the 'glue' on the clang side. Part of the problem is that we don't know yet what exactly clang needs to implement and how.

As a step 1, we need to figure out what builtins are missing and what they ultimately do. My guess is that most of them would map to something relatively simple that NVPTX back-end in LLVM may already provide via some intrinsic. There are number of similar cases where we implement clang's builtin with LLVM-side intrinsic already. Search for nvvm in clang's CGBuiltins.cpp for examples.

Some of builtins may have no directly usable support in LLVM. For those we'll need to either generate IR in clang using available LLVM intrinsics, if they are sufficient, implement required functionality as LLVM intrinsic and call it from clang, or combination of both approaches.

Then there's a question of texture or surface handles. If memory serves me, on C/C++ level those are just regular data types. On PTX level they must end up being .texref, .samplerref, or .surfref. LLVM has NVPTXReplaceImageHandles.cpp pass which appears to implement this magic, but I don't know what exactly it needs/expects from clang to do its job.

llvmbot commented 8 years ago

Fair enough :-D Unfortunately my knowledge of compilers is limited, and I don’t have that much time either. If you have some pointers on where/how to get started, previous patches for texture, tips, I’ll give it a try. :-)

I would guess that doing the basic plumbing shouldn’t be that difficult, but having the compiler optimise them, on the other hand, is harder? Though the optimisations should be the same as for textures, which are already supported, or at least partially, I think?

Artem-B commented 8 years ago

I'm not aware of anybody working on it at the moment. Patches would be welcome. :-)

llvmbot commented 8 years ago

Ah, that explains why it isn’t plugged in. Any ETA on when that could be supported? Whether it is a WIP?

Artem-B commented 8 years ago

Indeed, clang currently does not have builtins needed for surface/texture lookups to work.

ajaypbrt commented 1 year ago

Hello, I saw texture support was added in https://reviews.llvm.org/D110089 and from the discussion it seems that surface function support would follow a similar path. Right now, I believe support for the functions in surface_indirect_functions.h is the only remaining piece I need to compile my cuda library with clang, so I have a vested interest in seeing support for this.

Is that still on your todo list @Artem-B? I'm also happy to try and contribute a patch. I read the diff you contributed for texture support and think I understand the high-level picture of what would need to be done for this. Please let me know!

Artem-B commented 1 year ago

There's little to no demand for textures and surfaces support, so I don't have any specific plans to add surface APIs.

Surface support patch would be similar in structure to D110089. If you would like to give it a try, go ahead and I can help reviewing the patch.

I'm curious -- what do you plan to use the surfaces for? What kind of advantage do you expect to see?

ajaypbrt commented 1 year ago

@Artem-B I'd like to build https://github.com/alicevision/popsift with clang and it makes some calls to surf2DLayeredwrite<float>(...). I'll admit, I am quite new to writing cuda and don't know what advantages surfaces offer besides it being the implementation choice of the author(s). I suppose I could implement the same functionality without surfaces. Do you have any opinions about that?

I took a brief look yesterday, using nvcc to find the appropriate ptx calls for a subset of the surface functions and it seemed pretty straightforward. I am more concerned with how to structure the code and testing correctness across the matrix of cuda versions and device architectures than writing any one individual lookup.

A few questions if I do embark down trying to contribute something here:

Off the bat, why do need to define __nv_tex_surf_handler? Is it because it is an nvcc builtin?
How do I determine which template instances should be supported?
Would it be reasonable to redefine __nv_tex_surf_handler just for surfaces in another header, e.g. __clang_cuda_surface_indirect_functions.h and include that separately along with cuda's surface_indirect_functions.h? Therefore, not try and merge with your implementation of textures?
How would I write tests for this that could run in your CI?

Finally, if you don't think a contribution would be all that valuable, it would be a lot easier for me to just implement the lookup patch locally for the one surface function overload we are using here. I don't want to contribute something no one will use.

Artem-B commented 1 year ago

Do you have any opinions about that?

Not really. I know that in the past using textures for data fetches used different data path and gave some performance advantages, but that's not been the case for a very long time now. Textures have convenient functionality useful in some cases, but I haven't seen much use of it in practice. The project you've linked to seems to be using some of those fancy features. E.g. they rely on argument interpolation. Whether it's actually faster than fetching neighboring results and interpolating manually -- I do not know, but it would be an interesting experiment to try.

how to structure the code and testing correctness across the matrix of cuda versions and device architectures than writing any one individual lookup.

I don't think there's much (if any?) variation across CUDA versions or GPU variants. I think the only thing that's changed is that some APIs were removed in CUDA-12. https://github.com/llvm/llvm-project/blob/252c42354eca54274ed7b10c32c73c6937478e8b/clang/lib/Headers/__clang_cuda_texture_intrinsics.h#L669

Off the bat, why do need to define __nv_tex_surf_handler? Is it because it is an nvcc builtin? Yes. It's also an odd one, as it's parametrized by a string argument, which makes it somewhat awkward to implement efficiently. We could've defined it as a real function with a const char * argument, but then we'd have to search for the matching string at runtime. The in-header implementation as a macro in clang converts it into __tex_fetch<hash(texture_op_string)>() which allows defining separate per-operation functions which can be directly called w/o overhead of the string comparison.

How do I determine which template instances should be supported?

Search surface_indirect_functions.h in CUDA headers for all instances of __nv_tex_surf_handler calls and collect all string arguments they use.

Would it be reasonable to redefine __nv_tex_surf_handler just for surfaces in another header,

There should be only one implementation of __nv_tex_surf_handler, otherwise things will not work if a user tries to use it outside of the headers provided by clang. I'm not quite sure what you have in mind by "merge" here. __nv_tex_surf_handler itself is just a trivial macro expanding into a function template call. Providing function template specialization is largely independent for the textures and for the surfaces. You will need extend the perfect hash calculation to cover new string constants for the surface operations, but that's the only common part. The implementation if the functions can live in a separate file.

We could also put both texture and surface APIs into the same file __clang_cuda_texture_intrinsics.h and, maybe, rename it to __clang_cuda_texsurf_intrinsics.h.

How would I write tests for this that could run in your CI

Testing will need to be split into two parts. In-tree testing would verify that __nv_tex_surf_handler() called with the operations used by CUDA headers produces the same surface instructions the calls would produce with NVCC. That would probably be sufficient as we'd rely on NVCC as the reference for the correct code.

If you want to verify actual functionality, that would need to be added to CUDA tests in LLVM's test-suite. I'm not familiar enough with surfaces to tell what a reasonable test set would look like.

Finally, if you don't think a contribution would be all that valuable

It's not that black and white. First, it's a bit of a chicken-and-egg problem. If clang does not support it, users don't use it, If nobody uses it, there's not much pressure to add support. If/when we do add a feature, and it works, users rarely, if ever, announce that they are using it, so I have no clue whether anyone is actually using the textures with clang. I suspect some users do, but most of them are probably not aware of that. We've just got a bit more source code out there in the wild which will now compile with clang.

it would be a lot easier for me to just implement the lookup patch locally for the one surface function overload we are using here. I don't want to contribute something no one will use.

Will proper implementation take more work than just patching around your specific issue? Certainly. Will the patch be useful? Probably. Will it be essential? Probably not. We've survived till now. Will you know whether anyone actually uses it? Probably not.

So, the tangible effect of the patch will most likely be limited by my "thank you" and the satisfaction that you've made someone's life a bit easier (though they may not even know it).

ajaypbrt commented 1 year ago

@Artem-B Makes sense, thanks for the in-depth reply. Two follow-ups:

Are there any examples in the codebase of such tests for texture operations? I'm using that as my guide for this, so that would be useful.
I was unclear, the templating I'm talking about is for example in template<class T> void surf2DLayeredwrite(T data, cudaSurfaceObject surfObj, int x, int y, int layer, boundaryMode = cudaBoundaryModeTrap); what values of T do I need to support? I agree, I can get all the string arguments to __nv_tex_surf_handler with a quick search.

Artem-B commented 1 year ago

Are there any examples in the codebase of such tests for texture operations? I'm using that as my guide for this, so that would be useful.

:-( We currently have none. The only test we have now verifies that the header file compiles and largely tests that we have no hash collisions for the implemented texture functions, but that's about it. I have a script which was supposed to generate simple test cases for all texture op variants, but I've never got it finished (e.g. it looks like it only generates direct accesses at the moment) . https://gist.github.com/Artem-B/fe6092aa228d146ba4001bb95041b5f2

I was unclear, the templating I'm talking about is for example in template void surf2DLayeredwrite(T data, cudaSurfaceObject surfObj, int x, int y, int layer, boundaryMode = cudaBoundaryModeTrap); what values of T do I need to support?

I think those would be the types used to specialize __nv_isurf_trait in CUDA's surface_indirect_functions.h

ajaypbrt commented 1 year ago

:-( We currently have none. The only test we have now verifies that the header file compiles and largely tests that we have no hash collisions for the implemented texture functions, but that's about it. I have a script which was supposed to generate simple test cases for all texture op variants, but I've never got it finished (e.g. it looks like it only generates direct accesses at the moment) . https://gist.github.com/Artem-B/fe6092aa228d146ba4001bb95041b5f2

I probably will follow suit then.

I think those would be the types used to specialize __nv_isurf_trait in CUDA's surface_indirect_functions.h

Cool, that was my suspicion, thank you for confirming.

I think this should be everything I need to get started, thanks!

llvm / llvm-project

Use of undeclared identifier 'surf2Dwrite' #29289

Extended Description