open-mpi / hwloc

Hardware locality (hwloc)
https://www.open-mpi.org/projects/hwloc
Other
572 stars 173 forks source link

zlib is required even if pci and libxml2 are disabled #646

Closed nazar-pc closed 3 weeks ago

nazar-pc commented 9 months ago

What version of hwloc are you using?

2.10.0

Which operating system and hardware are you running on?

Ubuntu 22.04

Details of the problem

According to https://github.com/open-mpi/hwloc/blob/8b82269e321e44379b6e100d3b903401ed64d8a9/contrib/hwloc-valgrind.supp#L7 zlib is only needed for libpci and libxml2 features, so I was hoping when disabling those (--disable-io --disable-libxml2), zlib will not be necessary, but turned out it is still linked for some reason.

I'm wondering if this is expected behavior, right now it is hard to understand whether that is the case because right now by default random libraries are required depending on what other packages are found on the system at compile time and it is very hard to give someone exact list of dependencies with which hwloc is guaranteed to compile, especially when only minimal version with CPU support is needed.

bgoglin commented 9 months ago

Hello. Building with --disable-io --disable-libxml2 here (on Debian) doesn't bring zlib in libhwloc.so. But it does in lstopo (likely because of X11 libraries). Valgrind suppressions are mostly written to debug the core libhwloc.so, but it's already very hard to keep uptodate. Lstopo brings so many dependencies that maintaining suppressions for it would be horrible.

nazar-pc commented 9 months ago

I'm compiling a library on Ubuntu 22.04 and not sure if there is a way to disable lstopo on Linux (on Windows with CMake HWLOC_SKIP_LSTOPO can be used).

When compiled in a container with minimal dependencies, zlib1g-dev is not needed, but in GitHub Actions environment where a lot of stuff is present it is suddenly required. I only noticed this due to cross-compilation that forced me to install zlib1g-dev:arm64 for library to compile for aarch64.

Maybe the solution here would be to have --disable-lstopo and maybe --disable-testing --disable-tools similarly to CMake on Windows?

I initially reached out to --enable-embedded-mode, but unfortunately it doesn't build static library, so it was awkward to deal with.

bgoglin commented 9 months ago

First, can you check with ldd on lstopo and libhwloc.so that libz is only needed for lstopo?

nazar-pc commented 9 months ago

I build this stuff in CI and don't build shared library either.

The compiled executable (native x86-64) where static library was linked ended up with these dynamic libraries:

    linux-vdso.so.1 (0x00007ffdbdf2f000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f690bf19000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f690d515000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f690d4f5000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f690d4f0000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f690bc00000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f690d53b000)

And cross-compiled aarch64 looked like this:

    libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000005502a70000)
    libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000005502b20000)
    libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000005502b40000)
    libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000005502b70000)
    libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000005502b90000)
    /lib/ld-linux-aarch64.so.1 (0x0000005500000000)

Neither of them ended up linking libz, however compilation clearly required them (I'm not strong in this area, maybe linked removed extra libraries in the end):

``` error: linking with `aarch64-linux-gnu-gcc` failed: exit status: 1 | = note: LC_ALL="C" PATH="/home/runner/.rustup/toolchains/nightly-2023-10-16-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin:/opt/hostedtoolcache/protoc/v23.4/x64/bin:/home/runner/work/subspace/subspace/llvm/bin:/snap/bin:/home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:/home/runner/.config/composer/vendor/bin:/usr/local/.ghcup/bin:/home/runner/.dotnet/tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin" VSLANG="1033" "aarch64-linux-gnu-gcc" "/tmp/rustcAtiqXU/symbols.o" "/home/runner/work/subspace/subspace/target/aarch64-unknown-linux-gnu/production/deps/subspace_farmer-4ef2172de3fc95ae.subspace_farmer.c6b5885e48585486-cgu.11.rcgu.o" "-Wl,--as-needed" "-L" "/home/runner/work/subspace/subspace/target/aarch64-unknown-linux-gnu/production/deps" "-L" "/home/runner/work/subspace/subspace/target/production/deps" "-L" "/home/runner/work/subspace/subspace/target/aarch64-unknown-linux-gnu/production/build/blake3-bff7534b49010aad/out" "-L" "/home/runner/work/subspace/subspace/target/aarch64-unknown-linux-gnu/production/build/hwlocality-sys-50026434af8e2a9e/out/lib" "-L" "/home/runner/work/subspace/subspace/target/aarch64-unknown-linux-gnu/production/build/ring-f64a2e2010767b6a/out" "-L" "/home/runner/work/subspace/subspace/target/aarch64-unknown-linux-gnu/production/build/ring-1be1453bc52910d5/out" "-L" "/home/runner/work/subspace/subspace/target/aarch64-unknown-linux-gnu/production/build/libmimalloc-sys-93591c37a6a51563/out" "-L" "/home/runner/work/subspace/subspace/target/aarch64-unknown-linux-gnu/production/build/blst-1988681169e92acc/out" "-L" "/home/runner/work/subspace/subspace/target/aarch64-unknown-linux-gnu/production/build/zstd-sys-80d84ceafe298d04/out" "-L" "/home/runner/.rustup/toolchains/nightly-2023-10-16-x86_64-unknown-linux-gnu/lib/rustlib/aarch64-unknown-linux-gnu/lib" "-Wl,-Bstatic" "/tmp/rustcAtiqXU/liblibmimalloc_sys-e72d1de4cb1f5cca.rlib" "/tmp/rustcAtiqXU/libzstd_sys-a10d3259f5b5f954.rlib" "/tmp/rustcAtiqXU/libhwlocality_sys-e4afe4c870b6ad35.rlib" "/tmp/rustcAtiqXU/libring-49c71b72a947f599.rlib" "/tmp/rustcAtiqXU/libring-4d53c4d5f44e4130.rlib" "/tmp/rustcAtiqXU/libblake3-11d1a84355ec1b1c.rlib" "/tmp/rustcAtiqXU/libblst-8fba0f5d22acc9e8.rlib" "/home/runner/work/subspace/subspace/target/aarch64-unknown-linux-gnu/production/deps/libcompiler_builtins-0ab090af4b1843da.rlib" "-Wl,-Bdynamic" "-lm" "-lpthread" "-lz" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-L" "/home/runner/.rustup/toolchains/nightly-2023-10-16-x86_64-unknown-linux-gnu/lib/rustlib/aarch64-unknown-linux-gnu/lib" "-o" "/home/runner/work/subspace/subspace/target/aarch64-unknown-linux-gnu/production/deps/subspace_farmer-4ef2172de3fc95ae" "-Wl,--gc-sections" "-pie" "-Wl,-z,relro,-z,now" "-Wl,-O1" "-nodefaultlibs" = note: /usr/lib/gcc-cross/aarch64-linux-gnu/9/../../../../aarch64-linux-gnu/bin/ld: cannot find -lz collect2: error: ld returned 1 exit status ```

So I think it is fair to assume lstopo is where it was needed, though it it a bit difficult to figure out in CI environment, so I didn't bother.

bgoglin commented 1 month ago

Did you make any progress in understanding where libz comes from? --disable-lstopo --disable-testing --disable-tools would likely all be easy to implement (they somehow exist hidden behind --enable-embedded-mode which is not what you want).

nazar-pc commented 1 month ago

Did you make any progress in understanding where libz comes from?

Unfortunately I did not and I abandoned attempts shortly after last message. I would appreciate additional flags and fixes though, it'll simplify CI and make compilation of the library faster.

bgoglin commented 4 weeks ago

While working on --disable-utils, I am wondering if the issue is within configure or the actual build. What if you just run a normal ./configure && make -C hwloc && make -C hwloc install (somehow similar to --disable-utils that keeps configure checks but disables the utils/ directory), do you get any issue?

nazar-pc commented 4 weeks ago

As can be seen from CI logs above, it was trying to link the library. Whether it was caused by the configuration or something else I'm not entirely sure. I wasn't even calling configuration myself, it is wrapped by this Rust library internally: https://github.com/HadrienG2/hwlocality

bgoglin commented 4 weeks ago

@hadrienG2 do you by chance have any idea about this?

HadrienG2 commented 4 weeks ago

In the configuration used by the current public main branch of subspace-farmer, and when building for Linux, I configure hwloc with --enable-static --disable-cairo --disable-io --disable-libxml2 (well, I also add --disable-shared, but for our purposes this shouldn't matter and keeping the shared library in allows me to use ldd in later checks).

The only other change I make with respect to the vanilla build configuration is to add some RPATHs for hwloc transitive dependencies so that the final binary works without LD_LIBRARY_PATH tricks even if the dependencies are in a weird location. This does not affect the set of linked libraries.

So I replicated a local hwloc build in this configuration, using the same hwloc release as current hwlocality vendored builds...

curl https://download.open-mpi.org/release/hwloc/v2.10/hwloc-2.10.0.tar.gz | tar -xz
cd hwloc-2.10.0/ && mkdir build && cd build
../configure --enable-static --disable-cairo --disable-io --disable-libxml2
V=1 make -j$(nproc)

...and there is no mention of zlib or -lz in the configure script's output and the build commands.

The resulting libhwloc does not link to zlib...

$ ldd hwloc/.libs/libhwloc.so
    linux-vdso.so.1 (0x00007f36408b7000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f364074e000)
    libudev.so.1 => /lib64/libudev.so.1 (0x00007f364070a000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f3640400000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f36408b9000)
    libcap.so.2 => /lib64/libcap.so.2 (0x00007f36406fc000)

...and most importantly for the error at hand, its pkg-config does not direct dependents to link to zlib either:

$  cat hwloc.pc 
prefix=/usr/local
exec_prefix=${prefix}
libdir=${exec_prefix}/lib64
includedir=${prefix}/include

Name: hwloc
Description: Hardware locality detection and management library
Version: 2.10.0
Requires.private: 
Cflags: -I${includedir}
Libs: -L${libdir} -lhwloc
Libs.private: -lm  -ludev    -lpthread

In fact, there is no mention of the -lz build option anywhere in the build directory.

$ grep -r -- '-lz'
Makefile:dist-lzip: distdir
Makefile:   dist-all dist-bzip2 dist-gzip dist-hook dist-lzip dist-shar \

@nazar-pc Are you absolutely sure that 1/this error still occurs with current hwlocality and 2/this -lz flag comes from the use of hwlocality, and not from any of the other dependencies of subspace_farmer?

nazar-pc commented 3 weeks ago

Tested with current state of the code and libraries and zlib1g-dev:arm64 is no longer needed for cross-compilation it seems

bgoglin commented 3 weeks ago

Hmmm, I am closing the issue then, but I wonder what happened.