mesonbuild / meson

The Meson Build System
http://mesonbuild.com
Apache License 2.0
5.65k stars 1.64k forks source link

Meson outputs incorrectly-named shared libraries on OpenBSD #3570

Open kernigh opened 6 years ago

kernigh commented 6 years ago

Meson can't find shared libraries like /usr/lib/libm.so.10.1 in OpenBSD, but uses static libraries like /usr/lib/libm.a. OpenBSD uses ELF shared libraries but (unlike other ELF systems) has no symbolic link named libm.so. I have OpenBSD/amd64 6.3 and meson 0.47.0.dev1 (8a9f7cf).

Sometimes the compiler still uses the shared library, but sometimes it doesn't. For example, OpenBSD's package of libiconv installs /usr/local/lib/libiconv.a and /usr/local/lib/libiconv.so.6.0. The compiler and Meson don't look in /usr/local by default, and libiconv doesn't use pkg-config, so I will tell Meson to look for libiconv in /usr/local if the system is OpenBSD.

I have this C program iopen.c

#include <iconv.h>
#include <stdio.h>

int main(int argc, char **argv) {
    iconv_t cd = iconv_open("UTF-8", "EUC-JP");
    printf("got %lld", (long long)cd);
}

and its meson.build

project('cbrt', 'c')
cc = meson.get_compiler('c')
dirs = []
if host_machine.system() == 'openbsd'
    dirs += '/usr/local/lib'
    add_global_arguments('-I/usr/local/include', language: 'c')
endif
libiconv = cc.find_library('iconv', dirs: dirs, required: false)
executable('iopen', 'iopen.c', dependencies: libiconv)

Now I build it:

$ meson build
The Meson build system
Version: 0.47.0.dev1
Source dir: /home/kernigh/park/example
Build dir: /home/kernigh/park/example/build
Build type: native build
Project name: cbrt
Native C compiler: cc (clang 5.0.1 "OpenBSD clang version 5.0.1 (tags/RELEASE_50
1/final) (based on LLVM 5.0.1)")
Build machine cpu family: x86_64
Build machine cpu: x86_64
Library iconv found: YES
Build targets in project: 1
Found ninja-1.8.2 at /usr/local/bin/ninja
$ ninja -vC build
ninja: Entering directory `build'
[1/2] cc -Iiopen@exe -I. -I.. -I/usr/local/include -Xclang -fcolor-diagnostics -
pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O0 -g  -MD -MQ 'iopen@exe/iopen
.c.o' -MF 'iopen@exe/iopen.c.o.d' -o 'iopen@exe/iopen.c.o' -c ../iopen.c
[2/2] cc  -o iopen 'iopen@exe/iopen.c.o' -Wl,--no-undefined -Wl,--as-needed -Wl,
--start-group /usr/local/lib/libiconv.a -Wl,--end-group  

Meson used /usr/local/lib/libiconv.a, not /usr/local/lib/libiconv.so.6.0.

Meson might need to look for libraries named libiconv.so.X.Y, and pick the one with the highest version; see "Understanding shared libraries number rules".

nirbheek commented 6 years ago

Meson might need to look for libraries named libiconv.so.X.Y, and pick the one with the highest version; see "Understanding shared libraries number rules".

After reading that link, it seems that we should also not set -Wl,-soname on OpenBSD and also not install the libfoo.so and libfoo.so.x symlinks? Also the versioning should be libfoo.so.x.y and not libfoo.so.x.y.z like everywhere else?

Is this only OpenBSD or do other BSDs also follow this library naming?

kernigh commented 6 years ago

Only OpenBSD has this naming.

In an old FreeBSD 9.1 system, /usr/local/lib has a static library libiconv.a, a shared library libiconv.so.3, and a symbolic link from libiconv.so to libiconv.so.3.

In an old NetBSD 6.0.1 system, /usr/pkg/lib has a static library libiconv.a, a shared library libiconv.so.2.5.1, and symbolic links from libiconv.so to libiconv.so.2.5.1, and from libiconv.so.2 to libiconv.so.2.5.1.

jpakkane commented 6 years ago

Meson might need to look for libraries named libiconv.so.X.Y, and pick the one with the highest version

What is the rationale for this? It seems strange that every single build system that wants to support OpenBSD needs to reimplement identical behaviour for this use case? Why does the system not provide a method to do this automatically (it does not need to be the .so symlink thing as on other Unixes, but it should at least have something).

nirbheek commented 6 years ago

Why does the system not provide a method to do this automatically (it does not need to be the .so symlink thing as on other Unixes, but it should at least have something).

Going by the documentation, -lfoo works as expected. However, we do manual searching of libraries when dirs: is passed to cc.find_library(), and that needs to be special-cased for OpenBSD.

kernigh commented 6 years ago

I didn't notice until now, but Meson actually can't build a shared library in C for OpenBSD. Given this meson.build

project('example', 'c')
shared_library('duck', 'duck.c')

and duck.c

#include <stdio.h>

void quack(void) {
  puts("Kvack!");
}

then meson build succeeds, but ninja -C build fails:

$ ninja -C build
ninja: Entering directory `build'
[2/2] Linking target libduck.so.
FAILED: libduck.so 
cc  -o libduck.so 'duck@sha/duck.c.o' -Wl,--no-undefined -Wl,--as-needed -shared -fPIC -Wl,--start-group -Wl,-soname,libduck.so -Wl,--end-group  
duck@sha/duck.c.o: In function `quack':
../duck.c:4: undefined reference to `puts'
cc: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.

If I remove -Wl,--no-undefined, then it works. My shared library calls puts(3), a standard C function in libc. Meson is trying to use -Wl,--no-undefined to check that my shared library doesn't call undefined functions; but the C compiler is trying to link my shared library without libc, so puts(3) and all other libc functions are undefined.

My amd64 machine runs OpenBSD 6.3, the compiler is clang 5.0.1, and the linker is GNU ld 2.17. The shared libraries and executables are in ELF format, like in other BSDs and Linux, but OpenBSD has a strange way to find shared libraries. It is strange in OpenBSD that shared libraries never link to libc:

$ ldd /usr/local/lib/libpython3.6m.so.0.0  
/usr/local/lib/libpython3.6m.so.0.0:
        Start            End              Type  Open Ref GrpRef Name
        00000d205a88c000 00000d205ad61000 dlib  2    0   0      /usr/local/lib/libpython3.6m.so.0.0
        00000d205fa7c000 00000d205fc87000 rlib  0    1   0      /usr/local/lib/libintl.so.6.0
        00000d210e71d000 00000d210e926000 rlib  0    1   0      /usr/lib/libpthread.so.25.1
        00000d2102c60000 00000d2102e6d000 rlib  0    1   0      /usr/lib/libutil.so.13.0
        00000d20d4632000 00000d20d485a000 rlib  0    1   0      /usr/lib/libm.so.10.1
        00000d2090ef9000 00000d20911f6000 rlib  0    1   0      /usr/local/lib/libiconv.so.6.0

If I remember right, GNU/Linux has libc.so.6, and NetBSD has libc.so.12. These version numbers never change, so if every shared library links to libc.so.6 or libc.so.12, then everything still works, and Meson can use -Wl,--no-undefined to check if functions like puts(3) exist in libc.

In OpenBSD, the version numbers often change. After about 8 upgrades, my OpenBSD machine has libc versions 73.1, 77.0, 78.1, 80.1, 84.2, 88.0, 89.3, 90.0, 92.3. My header files are for 92.3. The C compiler links every executable to 92.3. It doesn't link any shared library to 92.3. I don't know why, but the C compiler might be trying to stop me from loading multiple versions of libc in one program. I have encountered programs that load multiple versions of the same library, but libc might be special in some way.

jpakkane commented 6 years ago

If I remove -Wl,--no-undefined, then it works.

What if you remove -Wl,--as-needed?

kernigh commented 6 years ago

@jpakkane, removing -Wl,--as-needed doesn't help.

Meson's command was cc -o libduck.so 'duck@sha/duck.c.o' -Wl,--no-undefined -Wl,--as-needed -shared -fPIC -Wl,--start-group -Wl,-soname,libduck.so -Wl,--end-group

If I remove -Wl,--as-needed and keep -Wl,--no-undefined, then I get the same error: ../duck.c:4: undefined reference to `puts'

If I remove -Wl,--no-undefined, then it works and libduck.so is output. It works whether or not I have -Wl,--as-needed. (I had reported in #3593 "c++ -Wl,--as-needed doesn't work in OpenBSD" that I can't link an executable with -Wl,--as-needed.)

kernigh commented 6 years ago

OpenBSD's "Avoid DT_SONAME hardcoding" says that -Wl,-soname is "usually not desirable on OpenBSD". Some build systems don't use -Wl,-soname in OpenBSD, so some shared libraries don't have SONAME. Meson has a problem with these libraries. The problem is that cc -Lpath -lname works but cc path/libname.so.0.0 doesn't work.

In a typical ELF system (Linux and probably FreeBSD, NetBSD, illumos), a shared library must have a SONAME with its major version. The linker ld opens a library like libm.so and reads a SONAME like libm.so.6. Then ld sets NEEDED to libm.so.6:

$ uname
Linux
$ readelf -d /usr/lib/powerpc-linux-gnu/libm.so | grep SONAME
 0x0000000e (SONAME)                     Library soname: [libm.so.6]
$ readelf -d /usr/bin/awk | grep NEEDED
 0x00000001 (NEEDED)                     Shared library: [libm.so.6]
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]

In OpenBSD, ld opens a library like libm.so.10.1, and I want it to set NEEDED to libm.so.10.1. This works if the library has libm.so.10.1 in its SONAME, or if the library has no SONAME but ld uses the filename libm.so.10.1 for NEEDED.

The problem is if the library has no SONAME, and ld doesn't use the filename for NEEDED. This happens if I don't use -l but I pass a path to the library:

$ uname
OpenBSD
$ readelf -d /usr/local/lib/libiconv.so.6.0 | grep SONAME
$ echo 'int main(void){}' >main.c
$ cc -o main-l main.c -L/usr/local/lib -liconv
$ readelf -d main-l | grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libiconv.so.6.0]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.92.3]
$ cc -o main-p main.c /usr/local/lib/libiconv.so.6.0
$ readelf -d main-p | grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [/usr/local/lib/libiconv.so.6.0]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.92.3]

An absolute path in NEEDED isn't too bad, but Meson can pass a relative path to the library, and ld can put the relative path in NEEDED. Then ld.so can't find the library! I had modified Meson to stop using -Wl,--no-undefined and -Wl,-soname. Meson was passing some tests but failing test cases/common/46 library chain:

$ ninja
[11/11] Linking target prog.
FAILED: prog 
cc  -o prog 'prog@exe/main.c.o' -Wl,--start-group subdir/liblib1.so -Wl,--end-group -Wl,-z,origin '-Wl,-rpath,$ORIGIN/subdir/subdir3:$ORIGIN/subdir/subdir2:$ORIGIN/subdir' '-Wl,-rpath-link,/home/kernigh/park/meson/test cases/common/46 library chain/build/subdir/subdir3:/home/kernigh/park/meson/test cases/common/46 library chain/build/subdir/subdir2:/home/kernigh/park/meson/test cases/common/46 library chain/build/subdir'  
/usr/bin/ld: warning: subdir/subdir2/liblib2.so, needed by subdir/liblib1.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: subdir/subdir3/liblib3.so, needed by subdir/liblib1.so, not found (try using -rpath or -rpath-link)
subdir/liblib1.so: undefined reference to `lib2fun'
subdir/liblib1.so: undefined reference to `lib3fun'
cc: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.
$ readelf -d subdir/liblib1.so | grep -E 'NEEDED|RPATH'
 0x0000000000000001 (NEEDED)             Shared library: [subdir/subdir2/liblib2.so]
 0x0000000000000001 (NEEDED)             Shared library: [subdir/subdir3/liblib3.so]
 0x000000000000000f (RPATH)              Library rpath: [$ORIGIN/subdir3:$ORIGIN/subdir2]

The rpath would allow liblib1.so to find liblib2.so and liblib3.so, but NEEDED is set wrong. This happened because lib2 and lib3 don't have SONAME, and Meson didn't use -l when it linked lib1 to lib2 and lib3.

OpenBSD's package of CMake splits /usr/local/lib/libiconv.so.6.0 into -L/usr/local/lib -liconv in the link command. I might want to teach Meson to do the same. Or I might want to make a directory of symlinks, where symlinks/libiconv.so.6.0 points to /usr/local/lib/libiconv.so.6.0, so I can do -Lsymlinks -liconv and avoid using libiconv.so.7.0 by mistake. Or I might want to change how OpenBSD's linker sets NEEDED. (OpenBSD uses GNU ld 2.17, an old version.) I have not yet written the code. I might go looking for someone who knows OpenBSD shared libraries better than I do.

jpakkane commented 6 years ago

OpenBSD's package of CMake splits /usr/local/lib/libiconv.so.6.0 into -L/usr/local/lib -liconv in the link command.

Interestingly in Meson we try to do the exact opposite as much as possible. -L alters global state and can thus lead to interesting bugs. Pointing directly at the library files you want to link with is the only reliable way to work when you have more than one version of any library with the same name. This is very common when, for example, developing a new version of a system library.

kernigh commented 6 years ago

I made a new branch https://github.com/kernigh/meson/tree/kernigh-openbsd-1 to teach Meson about OpenBSD's libfoo.so.X.Y shared libraries.

There's no pull request, because I made a mess. The first mess is how to detect OpenBSD. I want to call mesonlib.for_openbsd(is_cross, environment). Compilers have self.is_cross but don't have environment, so compilers don't know if they are compiling for OpenBSD. I tried adding GCC_OPENBSD and CLANG_OPENBSD types, but undid that because OpenBSD needs to be GCC_STANDARD and CLANG_STANDARD. I now have a workaround to disable -Wl,--no-undefined and -Wl,--as-needed on OpenBSD. You can't read the code for clang because it has two version variables: one has the first line of clang --version, the other has all lines, but you don't know which is which.

The second mess is how to pass -Wl,-soname,libfoo.so.X.Y when the code doesn't know Y. The code might need some refactoring. The existing code makes the string libfoo.so.X (for Linux and most BSDs) when making aliases, and remakes the string libfoo.so.X when passing soname. The refactored code might make libfoo.so.X only once.

nirbheek commented 6 years ago

https://github.com/mesonbuild/meson/pull/3851 will fix the library search portion of this bug. What else would be remaining for this besides the soname value?

kernigh commented 6 years ago

Meson's master, having merged #3851, can now find shared libraries like libiconv.so.6.0.

I found a problem: if there are multiple versions of the same library, Meson picks any one of them. Meson might pick a version that is too old, then the build might be wrong. I had deleted most of my old libraries, so my builds are working for now. To demonstrate the problem on OpenBSD,

const char *quack(void) { return V; }
set -e
dir=/tmp/lib
mkdir -p "$dir"
for version in "$@"; do
  file=libduck.so.$version
  cc -shared -o "$dir/$file" -Wl,-soname,"$file" -DV=\""$version"\" duck.c
  echo made "$file"
done
#include <stdio.h>
const char *quack(void);
int main(void) { printf("Quack %s\n", quack()); }
project('example', 'c')
dir = '/tmp/lib'
libduck = meson.get_compiler('c').find_library('duck', dirs: dir)
executable('example', 'example.c', dependencies: libduck,
  build_rpath: dir, install_rpath: dir)

I build 5 major versions of the library, then build the program.

$ sh new.sh 3.0 4.0 5.0 1.0 2.0
made libduck.so.3.0
made libduck.so.4.0
made libduck.so.5.0
made libduck.so.1.0
made libduck.so.2.0
$ meson build
...
Library duck found: YES
...
$ ninja -C build
ninja: Entering directory `build'
[2/2] Linking target example.
$ build/example
Quack 3.0

Meson picked version 3.0 instead of version 5.0. If the program needs some feature that wasn't in 3.0, or some struct changed between 3.0 and 5.0, then the program might not work now.

I want Meson to pick the highest version. In my draft code, I tried to pick the highest version using a regexp ending in '\.([0-9]+)\.([0-9]+)\Z' to capture version = tuple(map(int, match.group(1, 2))) to compare version > best_v. The code by @nirbheek uses a glob pattern to get the correct filenames; perhaps the pattern should come with a function that turns libduck.so.5.0 into (5, 0), so one can use Python's max().

nirbheek commented 6 years ago

I was under the impression that OpenBSD used separate install directories for each major version of a library. Otherwise, how can you know which one is compatible with your project?

If picking the highest major and minor version is what the linker does, we can duplicate that behaviour. Will write a patch for that tomorrow.

Just as important though, is another issue: we have no CI for OpenBSD, so we can't write integration tests and we can never know if PRs will break your platform.

Rust has a CI setup for OpenBSD and FreeBSD that uses qemu, it would be extremely good for first-class support if we had that too. Do you think you or @ajacoutot could work on that?

nirbheek commented 6 years ago

The fix for searching is at: https://github.com/mesonbuild/meson/pull/3861. What's remaining is ensuring that we output shared libraries with the correct naming.

ajacoutot commented 6 years ago

Hi @nirbheek

separate install directories for each major version of a library. Otherwise, how can you know which one is compatible with your project

When different versions of a shared library are installed, ld will load the one that your binary was linked against. When dlopening, it will pick the highest version (you don't need to specify the version, just dlopen libfoo.so).

And when linking, it will pick up the highest version. i.e. if you have libfoo.so.1.0 and libfoo.so.2.0, -lfoo will use libfoo.so.2.0. I believe you can choose the exact library version if you use the full path to it.

Regarding the CI, I guess I can try and find some time to work on it (no promise that it will happen soon). Do you guys have a CI for other systems already? So that I can match what you'll already doing. Thanks.

nirbheek commented 6 years ago

When different versions of a shared library are installed, ld will load the one that your binary was linked against.

That's the runtime linker. My question was about the compile-time linker; it seems wrong to have to remove libfoo.4.0 from your library path if you want to link against libfoo.3.0. Anyway, did you get a chance to test out https://github.com/mesonbuild/meson/pull/3861?

Regarding the CI, I guess I can try and find some time to work on it (no promise that it will happen soon). Do you guys have a CI for other systems already? So that I can match what you'll already doing.

Yes, we have CI for: linux and macos with Travis:

https://github.com/mesonbuild/meson/blob/master/.travis.yml

Windows (msvc), windows (mingw), cygwin with Appveyor:

https://github.com/mesonbuild/meson/blob/master/.appveyor.yml

For OpenBSD and FreeBSD you will want to duplicate in Travis what the Rust folks use:

https://github.com/nbaksalyar/rust-libc/tree/master/ci

kernigh commented 6 years ago

A machine may keep old versions of shared libraries, but only the latest version of the header files. For example, my OpenBSD machine has libc.so.92.3, libc.so.90.0, libc.so.89.3, ... but headers like /usr/include/stdio.h are for 92.3. I can use libc.so.90.0 to run old programs, but I can't build anything with 90.0, because I don't have 90.0's headers.

If I need an older libfoo to build my project, then yes, I would need a separate directory for the library. That way, cc -I/dir/include -L/dir/lib -lfoo would use the older libfoo, not the newer one.

kernigh commented 6 years ago

This works: Meson can now find the latest version of libfoo.so.X.Y on OpenBSD.

These don't work:

I will not be able to work on this during the next several days.

nirbheek commented 6 years ago

I will not be able to work on this during the next several days.

We can fix all these remaining issues quite easily. What we can't do is ensure that things work correctly on OpenBSD, and that they will continue to do so in the future. Hence the CI is what we need you to work on (when you can). :)

The last time I tried to set that up, I ran into tons of issues getting the OS to boot and then to detect the qemu devices, so I had no network.

nirbheek commented 6 years ago

the C compiler is trying to link my shared library without libc, so puts(3) and all other libc functions are undefined.

Why does this happen? Do you need to explicitly pass -lc to the compiler for that to work?

kernigh commented 6 years ago

For some reason, libraries in OpenBSD don't link to libc:

$ ldd /usr/lib/libssl.so.45.1                                        
/usr/lib/libssl.so.45.1:
        Start            End              Type  Open Ref GrpRef Name
        00001b883f317000 00001b883f56b000 dlib  1    0   0      /usr/lib/libssl.so.45.1
        00001b88b1f52000 00001b88b232e000 rlib  0    1   0      /usr/lib/libcrypto.so.43.1

This libssl links to libcrypto but not to libc. This libssl has undefined symbols like malloc and opendir with no libc to define them. The flag -Wl,--no-undefined seems to turn such undefined symbols into errors.

I don't know why libraries in OpenBSD don't link to libc. I speculate that OpenBSD might not want to load multiple versions of libc in the same process. The program picks one version of libc, and libraries don't load a different version of libc. This might fail, because libraries might crash after the program picks an incompatible libc. I speculate that loading multiple versions of libc might fail in a different way.

Other systems might have just one version of libc, so every library can link to that version, and -Wl,--no-undefined can check that malloc and opendir are defined.

ajacoutot commented 6 years ago

@kernigh thanks. We are indeed patching our meson package for this: https://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/ports/devel/meson/patches/patch-mesonbuild_build_py?rev=1.11 I did not open a PR because the patch is made to work with OpenBSD ports tree where we explicitly stay in control of shared libs versioning.

The meson mk framework (that is used by ports requiring meson to build - https://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/ports/devel/meson/meson.port.mk?rev=1.26) has this as well:

# don't use "-Wl,--no-undefined" nor "-zdefs" when linking"; OpenBSD does not
# link libc into shared-libraries by default to avoid binding libraries to
# specific libc majors, so those options have always suffered false positives
CONFIGURE_ARGS +=   -Db_lundef=false

And yes yes yes, I need to work on a CI...