Cuda compiler and OpenMP

jml1795 commented 5 years ago

When building a shared_library with mixed C++ and Cuda that has a link dependency on openmp, meson passes -fopenmp to nvcc causing the build to fail.

The following should demonstrate the error as a mwe:

example = shared_library('example', 'file.cpp', 'file.cu', link_args: '-fopenmp')

In reality, the openmp link dependency is introduced when using pkg-config via the dependency method to find a transitive dependency of the shared library that fails to build.

jml1795 commented 5 years ago

@obilaniu I pushed a commit to my fork that demonstrates a hacky fix. I don't really like it because it is specific to this one flag but I think it shows the problem more clearly. The commit isn't to merge, just for demonstration. commit

Also, possibly related, but who do we want to do the linking in cuda + c++ libraries - nvcc or the C++ linker? It seems like using the c++ toolchain as much as possible would avoid some of the headache w/ wrapping the flags in -Xcompiler. I adjusted the order of clink_langs as an experiment in that same commit which used the c++ linker. In the case we swap them, we should only need to link in cudart based on the presence of cuda code, which should be done behind the scenes so the user doesn't have to manually do a find_library and dependency declaration.

obilaniu commented 5 years ago

@jml1795 I cannot see that commit.

Also, possibly related, but who do we want to do the linking in cuda + c++ libraries - nvcc or the C++ linker? It seems like using the c++ toolchain as much as possible would avoid some of the headache w/ wrapping the flags in -Xcompiler.

Oh, this I have studied. If the linking is not done by nvcc but by CUDA-unaware tools like ld, then the final stage of linking requires an extra step that extracts special device sections of nvcc-generated host object files and puts them together into a new host object file: nvcc -dlink -o devicefinallink.o objs*.o. This devicefinallink.o file and the objs*.o are then linked with the CUDA-unaware linker into a final shared library/executable.

Since the CUDA-unaware linker blissfully ignores the special device sections of the host object files, it effectively drops them; This very problem explains why the -dlink step is required. It pulls the special device sections containing device code from the host object files and, after linking, embeds them in a "regular" data section of a new host object file, ready for linking and reference from host code. Failure to do this results in mystery symbols referenced from __cudaRegisterLinkedBinary.

As a final detail, this linking step should be done with the C++ compiler/linker, as the CUDA kernel magic call syntax <<<>>> generates some C++ behind the scenes, unless -fno-exceptions is used. It is non-trivial to detect whether a .cu file is linkable with just the C library, or needs the C++ library. To play it safe, we should always assume the C++ library will be required.

I believe we should use the host linker. The final link must be done by one tool, and if the presence of a single CUDA file causes us to require nvcc, then the next time someone requires special link tools, we'll have an unsolvable conflict.

I believe we should track the compiler that produced an object file, if this isn't being done already. If that object file came from gcc, then it definitely does not contain special GPU device sections, but if it came from NVCC it might. All objects participating in a shared or executable target final link step that are found to have come from nvcc would then participate in the extra, preparatory nvcc -dlink step.

CUDA code presence is non-trivial to detect; Not all .cu files might have some. I'd prefer to leave -cudart none/static/shared in the hands of the user as a cuda_args:.

jml1795 commented 5 years ago

@obilaniu I updated the link.

So is the plan to eventually use the -dlink and intermediate object and then link w/ ld (for example)? I'm assuming this is what you mean by "host" at least.

obilaniu commented 5 years ago

@jml1795 Yes, using the CUDA linker would be relying on nvcc, whereas going through the host system linker would entail -dlink'ing and then supplying that as an extra, but otherwise completely normal, object.

zephyr111 commented 5 years ago

I think the proper way to add OpenMP regarding the meson documentation is to use something like the following line.

dependency('openmp', version: '>=4.0', language: 'cpp', required: false)

However in practice, it does not work, throwing the following error, even though the language is cpp and not cuda... So there could be multiple bugs related to this overall issue.

ERROR: Language Cuda does not support OpenMP flags.

dcbaker commented 5 years ago

that error is because the cuda compiler doesn't specify any flags for openmp and it's falling through to the default "not supported".

nh2 commented 4 years ago

I'm having some trouble compiling CUDA with latest meson and would like to inquire what the state of this is.

Meson decides to use nvcc as the linker for me (probably because this issue here isn't solved yet. Fine by me, but:

I see -Xlinker=-fopenmp being passed to nvcc, but that errors out ld with

-f may not be used without -shared

I think that's because -Xlinker is wrong, -fopenmp is only accepte by the compiler (i.e. gcc), not the linker.

What options do I have?

obilaniu commented 4 years ago

@nh2 Ay, this is not good. That means there's a break in Meson's support of the CUDA compiler.

For a personal project of mine, which antedated the arrival of CUDA support in Meson, I used nvcc as a generator to compile object files, which would be gathered into a "pre-dlink" static library. I would then use nvcc again as a generator to -dlink the "pre-dlink" static library's objects into a single host object file. This single object would then get linked into a "post-dlink" static library. The "post-dlink" static library would then be what gets used everywhere it's needed.

That gave me full control of the flags to nvcc but did result in pointless named static libraries. I badly want CMake-style object_library() but I don't think they've been appreciated (https://github.com/mesonbuild/meson/pull/622#issuecomment-251590086).

nh2 commented 4 years ago

@obilaniu Do you have an example code snippet for your old approach? That would at least allow me to get my project building in the meantime :)

obilaniu commented 4 years ago

@nh2

#
# Compile generator and linker args.
#
nvccCompGen   = generator(nvcc,
    arguments: nvcc_ccbin_flag + nvcc_gencode_flags + nvcc_cudart_flags + nvcc_opt_flags + nvcuvid_iflags +
               ['-x', 'cu', '@EXTRA_ARGS@', '@INPUT@', '-c', '-o', '@OUTPUT@'],
    output:    ['@BASENAME@.o'],
)
nvccLinkArgs  = ['-dlink'] + nvcc_ccbin_flag + nvcc_cudart_flags + nvcc_gencode_flags

### CUDA code compilation
libfoocuda_cdefs     = ['-DFOO_CUDA_IS_SHARED=1']
libfoocudaCuda_srcs = files('kernels.cu')
libfoocudaCuda_objs = nvccCompGen.process(libfoocudaCuda_srcs, extra_args: [
    '-Xcompiler', '-fPIC',
    '-Xptxas', '--warn-on-double-precision-use,-O3',
    '-DFOO_CUDA_IS_BUILDING=1',
    '-DFOO_IS_SHARED=1',
] + libfoocuda_cdefs + foo_iflags)
libfoocudaCuda_sta  = static_library('pre-dlink', libfoocudaCuda_objs)
libfoocudaCuda_dep  = declare_dependency(link_whole: libfoocudaCuda_sta)
libfoocudaCuda_objs = custom_target ('libfoocudaCuda-dlink',
    command         : [nvcc, '-shared'] + nvccLinkArgs + ['@INPUT@', '-o', '@OUTPUT@'],
    input           : libfoocudaCuda_sta,
    output          : ['@BASENAME@-dlink.o'],
    build_by_default: true,
    install         : false
)
libfoocudaCuda_sta = static_library('dlink', libfoocudaCuda_objs)
libfoocudaCuda_dep = declare_dependency(dependencies: libfoocudaCuda_dep,
                                        link_whole:   libfoocudaCuda_sta)

nh2 commented 4 years ago

@obilaniu Questions:

It gets me ERROR: Linker nvlink does not support link_whole
Your variable names are repeated, would the following be accurate (postfixing the second set of variables by _final?

nvcc_ccbin_flag = []
nvcc_gencode_flags = []
nvcc_cudart_flags = []
nvcc_opt_flags = []
nvcc_iflags = []

nvccCompGen   = generator(nvcc,
    arguments: nvcc_ccbin_flag + nvcc_gencode_flags + nvcc_cudart_flags + nvcc_opt_flags + nvcc_iflags +
               ['-x', 'cu', '@EXTRA_ARGS@', '@INPUT@', '-c', '-o', '@OUTPUT@'],
    output:    ['@BASENAME@.o'],
)
nvccLinkArgs  = ['-dlink'] + nvcc_ccbin_flag + nvcc_cudart_flags + nvcc_gencode_flags

### CUDA code compilation
libfoocuda_cdefs   = [] # C pre processor -Dstuff goes in here
libfoocudaCuda_srcs = files('kernels.cu')
libfoocudaCuda_objs = nvccCompGen.process(libfoocudaCuda_srcs, extra_args: [
    '-Xcompiler', '-fPIC',
    '-Xptxas', '--warn-on-double-precision-use,-O3',
] + libfoocuda_cdefs)
libfoocudaCuda_sta  = static_library('pre-dlink', libfoocudaCuda_objs)
libfoocudaCuda_dep  = declare_dependency(link_whole: libfoocudaCuda_sta)

libfoocudaCuda_objs_final = custom_target('libfoocudaCuda-dlink',
    command         : [nvcc, '-shared'] + nvccLinkArgs + ['@INPUT@', '-o', '@OUTPUT@'],
    input           : libfoocudaCuda_sta,
    output          : ['@BASENAME@-dlink.o'],
    build_by_default: true,
    install         : false
)
libfoocudaCuda_sta_final = static_library('dlink', libfoocudaCuda_objs_final)
libfoocudaCuda_dep_final = declare_dependency(dependencies: libfoocudaCuda_dep,
                                              link_whole:   libfoocudaCuda_sta_final)

nh2 commented 4 years ago

Linker nvlink does not support link_whole

@obilaniu Oh, I still had a add_languages(['cuda'], required: false) further up, I suppose I need to disable all Meson-provided cuda support?

obilaniu commented 4 years ago

@nh2 Shouldn't have to. Perhaps set link_language: 'c'? I'm guessing new-ish Meson spots that the source files of the generator were .cu, and autoselects nvlink, but since I've handled the linking step, the regular linker should work.

nh2 commented 4 years ago

@obilaniu Getting rid of the add_languages(['cuda']) did make the above snippet it for me, I only had to add

nvcc_iflags = ['-I', '@CURRENT_SOURCE_DIR@/include']

I think I'll do that until the Meson CUDA support is fully fixed.

One thing that still needs fixing in the setup though is that there are multiple libcudart.so links in ldd on my executables and my library, and only the last one of each is filled with a path, the other ones are not found:

% ldd build/myexe
        ...
        libcudart.so.9.1 => not found
        libcudart.so.9.1 => /nix/store/83b6yyzyk72zrp8s06fqw9bc0n9zk5wl-cudatoolkit-9.1.85.1-lib/lib/libcudart.so.9.1 (0x00007f92a180c000)
        ...

Adding link_args: ['-lcudart'], to both my library() and executable() calls fixes that, but I wonder why it's necessary and how I can remove it.

Usually having it on my librar() producing a .so should be enough, as that carries dependencies through. I understand I may have to give it once as https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/ says

The CUDA Runtime API library is automatically linked when we use nvcc for linking, but we must explicitly link it (-lcudart) when using another linker.

nh2 commented 4 years ago

Related (just for linking issues): https://github.com/mesonbuild/meson/issues/1003#issuecomment-445692103

mesonbuild / meson

Cuda compiler and OpenMP #4993