Open jml1795 opened 5 years ago
@obilaniu I pushed a commit to my fork that demonstrates a hacky fix. I don't really like it because it is specific to this one flag but I think it shows the problem more clearly. The commit isn't to merge, just for demonstration. commit
Also, possibly related, but who do we want to do the linking in cuda + c++ libraries - nvcc or the C++ linker? It seems like using the c++ toolchain as much as possible would avoid some of the headache w/ wrapping the flags in -Xcompiler
. I adjusted the order of clink_langs
as an experiment in that same commit which used the c++ linker. In the case we swap them, we should only need to link in cudart
based on the presence of cuda code, which should be done behind the scenes so the user doesn't have to manually do a find_library and dependency declaration.
@jml1795 I cannot see that commit.
Also, possibly related, but who do we want to do the linking in cuda + c++ libraries - nvcc or the C++ linker? It seems like using the c++ toolchain as much as possible would avoid some of the headache w/ wrapping the flags in
-Xcompiler
.
Oh, this I have studied. If the linking is not done by nvcc
but by CUDA-unaware tools like ld
, then the final stage of linking requires an extra step that extracts special device sections of nvcc
-generated host object files and puts them together into a new host object file: nvcc -dlink -o devicefinallink.o objs*.o
. This devicefinallink.o
file and the objs*.o
are then linked with the CUDA-unaware linker into a final shared library/executable.
Since the CUDA-unaware linker blissfully ignores the special device sections of the host object files, it effectively drops them; This very problem explains why the -dlink
step is required. It pulls the special device sections containing device code from the host object files and, after linking, embeds them in a "regular" data section of a new host object file, ready for linking and reference from host code. Failure to do this results in mystery symbols referenced from __cudaRegisterLinkedBinary
.
As a final detail, this linking step should be done with the C++ compiler/linker, as the CUDA kernel magic call syntax <<<>>>
generates some C++ behind the scenes, unless -fno-exceptions
is used. It is non-trivial to detect whether a .cu
file is linkable with just the C library, or needs the C++ library. To play it safe, we should always assume the C++ library will be required.
I believe we should use the host linker. The final link must be done by one tool, and if the presence of a single CUDA file causes us to require nvcc
, then the next time someone requires special link tools, we'll have an unsolvable conflict.
I believe we should track the compiler that produced an object file, if this isn't being done already. If that object file came from gcc
, then it definitely does not contain special GPU device sections, but if it came from NVCC it might. All objects participating in a shared or executable target final link step that are found to have come from nvcc
would then participate in the extra, preparatory nvcc -dlink
step.
CUDA code presence is non-trivial to detect; Not all .cu
files might have some. I'd prefer to leave -cudart none/static/shared
in the hands of the user as a cuda_args:
.
@obilaniu I updated the link.
So is the plan to eventually use the -dlink
and intermediate object and then link w/ ld (for example)? I'm assuming this is what you mean by "host" at least.
@jml1795 Yes, using the CUDA linker would be relying on nvcc
, whereas going through the host system linker would entail -dlink
'ing and then supplying that as an extra, but otherwise completely normal, object.
I think the proper way to add OpenMP regarding the meson documentation is to use something like the following line.
dependency('openmp', version: '>=4.0', language: 'cpp', required: false)
However in practice, it does not work, throwing the following error, even though the language is cpp and not cuda... So there could be multiple bugs related to this overall issue.
ERROR: Language Cuda does not support OpenMP flags.
that error is because the cuda compiler doesn't specify any flags for openmp and it's falling through to the default "not supported".
I'm having some trouble compiling CUDA with latest meson and would like to inquire what the state of this is.
Meson decides to use nvcc
as the linker for me (probably because this issue here isn't solved yet. Fine by me, but:
I see -Xlinker=-fopenmp
being passed to nvcc
, but that errors out ld
with
-f may not be used without -shared
I think that's because -Xlinker
is wrong, -fopenmp
is only accepte by the compiler (i.e. gcc), not the linker.
What options do I have?
@nh2 Ay, this is not good. That means there's a break in Meson's support of the CUDA compiler.
For a personal project of mine, which antedated the arrival of CUDA support in Meson, I used nvcc
as a generator to compile object files, which would be gathered into a "pre-dlink" static library. I would then use nvcc
again as a generator to -dlink
the "pre-dlink" static library's objects into a single host object file. This single object would then get linked into a "post-dlink" static library. The "post-dlink" static library would then be what gets used everywhere it's needed.
That gave me full control of the flags to nvcc
but did result in pointless named static libraries. I badly want CMake-style object_library()
but I don't think they've been appreciated (https://github.com/mesonbuild/meson/pull/622#issuecomment-251590086).
@obilaniu Do you have an example code snippet for your old approach? That would at least allow me to get my project building in the meantime :)
@nh2
#
# Compile generator and linker args.
#
nvccCompGen = generator(nvcc,
arguments: nvcc_ccbin_flag + nvcc_gencode_flags + nvcc_cudart_flags + nvcc_opt_flags + nvcuvid_iflags +
['-x', 'cu', '@EXTRA_ARGS@', '@INPUT@', '-c', '-o', '@OUTPUT@'],
output: ['@BASENAME@.o'],
)
nvccLinkArgs = ['-dlink'] + nvcc_ccbin_flag + nvcc_cudart_flags + nvcc_gencode_flags
### CUDA code compilation
libfoocuda_cdefs = ['-DFOO_CUDA_IS_SHARED=1']
libfoocudaCuda_srcs = files('kernels.cu')
libfoocudaCuda_objs = nvccCompGen.process(libfoocudaCuda_srcs, extra_args: [
'-Xcompiler', '-fPIC',
'-Xptxas', '--warn-on-double-precision-use,-O3',
'-DFOO_CUDA_IS_BUILDING=1',
'-DFOO_IS_SHARED=1',
] + libfoocuda_cdefs + foo_iflags)
libfoocudaCuda_sta = static_library('pre-dlink', libfoocudaCuda_objs)
libfoocudaCuda_dep = declare_dependency(link_whole: libfoocudaCuda_sta)
libfoocudaCuda_objs = custom_target ('libfoocudaCuda-dlink',
command : [nvcc, '-shared'] + nvccLinkArgs + ['@INPUT@', '-o', '@OUTPUT@'],
input : libfoocudaCuda_sta,
output : ['@BASENAME@-dlink.o'],
build_by_default: true,
install : false
)
libfoocudaCuda_sta = static_library('dlink', libfoocudaCuda_objs)
libfoocudaCuda_dep = declare_dependency(dependencies: libfoocudaCuda_dep,
link_whole: libfoocudaCuda_sta)
@obilaniu Questions:
ERROR: Linker nvlink does not support link_whole
_final
?nvcc_ccbin_flag = []
nvcc_gencode_flags = []
nvcc_cudart_flags = []
nvcc_opt_flags = []
nvcc_iflags = []
nvccCompGen = generator(nvcc,
arguments: nvcc_ccbin_flag + nvcc_gencode_flags + nvcc_cudart_flags + nvcc_opt_flags + nvcc_iflags +
['-x', 'cu', '@EXTRA_ARGS@', '@INPUT@', '-c', '-o', '@OUTPUT@'],
output: ['@BASENAME@.o'],
)
nvccLinkArgs = ['-dlink'] + nvcc_ccbin_flag + nvcc_cudart_flags + nvcc_gencode_flags
### CUDA code compilation
libfoocuda_cdefs = [] # C pre processor -Dstuff goes in here
libfoocudaCuda_srcs = files('kernels.cu')
libfoocudaCuda_objs = nvccCompGen.process(libfoocudaCuda_srcs, extra_args: [
'-Xcompiler', '-fPIC',
'-Xptxas', '--warn-on-double-precision-use,-O3',
] + libfoocuda_cdefs)
libfoocudaCuda_sta = static_library('pre-dlink', libfoocudaCuda_objs)
libfoocudaCuda_dep = declare_dependency(link_whole: libfoocudaCuda_sta)
libfoocudaCuda_objs_final = custom_target('libfoocudaCuda-dlink',
command : [nvcc, '-shared'] + nvccLinkArgs + ['@INPUT@', '-o', '@OUTPUT@'],
input : libfoocudaCuda_sta,
output : ['@BASENAME@-dlink.o'],
build_by_default: true,
install : false
)
libfoocudaCuda_sta_final = static_library('dlink', libfoocudaCuda_objs_final)
libfoocudaCuda_dep_final = declare_dependency(dependencies: libfoocudaCuda_dep,
link_whole: libfoocudaCuda_sta_final)
Linker nvlink does not support link_whole
@obilaniu Oh, I still had a add_languages(['cuda'], required: false)
further up, I suppose I need to disable all Meson-provided cuda support?
@nh2 Shouldn't have to. Perhaps set link_language: 'c'
? I'm guessing new-ish Meson spots that the source files of the generator were .cu
, and autoselects nvlink
, but since I've handled the linking step, the regular linker should work.
@obilaniu Getting rid of the add_languages(['cuda'])
did make the above snippet it for me, I only had to add
nvcc_iflags = ['-I', '@CURRENT_SOURCE_DIR@/include']
I think I'll do that until the Meson CUDA support is fully fixed.
One thing that still needs fixing in the setup though is that there are multiple libcudart.so
links in ldd
on my executables and my library, and only the last one of each is filled with a path, the other ones are not found
:
% ldd build/myexe
...
libcudart.so.9.1 => not found
libcudart.so.9.1 => /nix/store/83b6yyzyk72zrp8s06fqw9bc0n9zk5wl-cudatoolkit-9.1.85.1-lib/lib/libcudart.so.9.1 (0x00007f92a180c000)
...
Adding link_args: ['-lcudart'],
to both my library()
and executable()
calls fixes that, but I wonder why it's necessary and how I can remove it.
Usually having it on my librar()
producing a .so
should be enough, as that carries dependencies through. I understand I may have to give it once as https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/ says
The CUDA Runtime API library is automatically linked when we use
nvcc
for linking, but we must explicitly link it (-lcudart
) when using another linker.
Related (just for linking issues): https://github.com/mesonbuild/meson/issues/1003#issuecomment-445692103
When building a shared_library with mixed C++ and Cuda that has a link dependency on openmp, meson passes
-fopenmp
tonvcc
causing the build to fail.The following should demonstrate the error as a mwe:
In reality, the openmp link dependency is introduced when using pkg-config via the
dependency
method to find a transitive dependency of the shared library that fails to build.