zalando / spilo

Highly available elephant herd: HA PostgreSQL cluster using Docker
Apache License 2.0
1.55k stars 385 forks source link

wal-g: error while loading shared libraries: libsodium.so.23: cannot open shared object file: No such file or directory #876

Closed veragini closed 1 year ago

veragini commented 1 year ago

Hi Team,

I gited cloned from the master branch 10 days ago as the master branch contains some fixes to a few issues presented in the latest version 3.0-p1.

By trying to test wal-g, I came across the following error:

root@f2b504b95e3e:/home/postgres# wal-g
wal-g: error while loading shared libraries: libsodium.so.23: cannot open shared object file: No such file or directory

Am I correct? By looking at the docker built log I can see the following from dependencies.sh

Firstly it seems to be prompting the following one:

Adding repository.
Press [ENTER] to continue or Ctrl-c to cancel.

Perhaps shall we add -y here:

add-apt-repository ppa:longsleep/golang-backports -y

And lately the following, which does stop and seems to be not performing the libsodium linking.

line 33: cd: /wal-g: No such file or directory

Perhaps parsing /wal-g as part of the git clone line:

git clone -b "$WALG_VERSION" --recurse-submodules https://github.com/wal-g/wal-g.git /wal-g

Thanks,

veragini commented 1 year ago

I will take a further look. Although it seems the build hasn't reported the above after making these two changes in the dependencies.sh, I am still getting the same:

root@5f8c214e9164:/home/postgres# wal-g
wal-g: error while loading shared libraries: libsodium.so.23: cannot open shared object file: No such file or directory
veragini commented 1 year ago

I have more data to share:

I have copied dependencies.sh to my local built from master spilo image, and ran it from a built image in order to test wal-g installation.

docker run --rm -it spilo:latest bash

./dependencies.sh

It does almost everything, but fail at this test stage:

PASS: siphashx24
PASS: xchacha20
../../build-aux/test-driver: line 107: 40573 Killed                  "$@" > $log_file 2>&1
FAIL: pwhash_scrypt
============================================================================
Testsuite summary for libsodium 1.0.17
============================================================================
# TOTAL: 75
# PASS:  74
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See test/default/test-suite.log
Please report to https://github.com/jedisct1/libsodium/issues
============================================================================
make[4]: *** [Makefile:1893: test-suite.log] Error 1
make[4]: Leaving directory '/wal-g/tmp/libsodium/test/default'
make[3]: *** [Makefile:2001: check-TESTS] Error 2
make[3]: Leaving directory '/wal-g/tmp/libsodium/test/default'
make[2]: *** [Makefile:2593: check-am] Error 2
make[2]: Leaving directory '/wal-g/tmp/libsodium/test/default'
make[1]: *** [Makefile:401: check-recursive] Error 1
make[1]: Leaving directory '/wal-g/tmp/libsodium/test'
make: *** [Makefile:514: check-recursive] Error 1
+ export USE_LIBSODIUM=1
+ USE_LIBSODIUM=1
+ export USE_LZO=1
+ USE_LZO=1
+ make pg_build
(cd main/pg && go build -mod vendor -tags "brotli libsodium lzo" -o wal-g -ldflags "-s -w -X github.com/wal-g/wal-g/cmd/pg.buildDate=`date -u +%Y.%m.%d_%H:%M:%S` -X github.com/wal-g/wal-g/cmd/pg.gitRevision=`git rev-parse --short HEAD` -X github.com/wal-g/wal-g/cmd/pg.walgVersion=`git tag -l --points-at HEAD`")
+ bash -s
+ printf 'shopt -s extglob\nrm /builddeps/!(*_%s.deb)' amd64
rm: cannot remove '/builddeps/!(*_amd64.deb)': No such file or directory

However, it makes wal-g working.

root@426df134d9bf:/wal-g# wal-g --version
wal-g version v2.0.1    b7d53dd7    2023.05.01_02:26:27 PostgreSQL
root@426df134d9bf:/wal-g# wal-g --help
PostgreSQL backup tool

Usage:
wal-g [command]

Available Commands:
backup-fetch  Fetches a backup from storage
backup-list   Prints available backups
backup-mark   Marks a backup permanent or impermanent
backup-push   Makes backup and uploads it to storage
catchup-fetch Fetches an incremental backup from storage
catchup-list  Prints available incremental backups
catchup-push  Creates incremental backup from lsn
completion    Output shell completion code for the specified shell
copy          copy specific or all backups
delete        Clears old backups and WALs
flags         Display the list of available global flags for all wal-g commands
help          Help about any command
pgbackrest    Interact with pgbackrest backups (beta)
st            (DANGEROUS) Storage tools
wal-fetch     Fetches a WAL file from storage
wal-push      Uploads a WAL file to storage
wal-receive   Receive WAL stream with postgres Streaming Replication Protocol and push to storage
wal-restore   Restores WAL segments from storage.
wal-show      Show storage WAL segments info grouped by timelines.
wal-verify    Verify WAL storage folder. Available checks: integrity, timeline.

Flags:
      --config string   config file (default is $HOME/.walg.json)
  -h, --help            help for wal-g
      --turbo           Ignore all kinds of throttling defined in config
  -v, --version         version for wal-g

To get the complete list of all global flags, run: 'wal-g flags'

Use "wal-g [command] --help" for more information about a command.

I was wondering if there is a post script in the main Dockerfile which may removes necessary dependencies for wal-g?

I will continue looking at it.

Cheers,

hughcapet commented 1 year ago

First of all, how do you build the image? (the exact command)

veragini commented 1 year ago

Yes, sure.

git clone https://github.com/zalando/spilo.git

cd spilo/postgres-appliance/

docker buildx build --platform=linux/amd64 --build-arg PGVERSION=15 -t spilo .

hughcapet commented 1 year ago

let's start from the very beginning. You run these commands and the build is successful but then you run the wal-g command in the container and it fails with an absent libsodium?

What is this about then?

And lately the following, which does stop and seems to be not performing the libsodium linking. line 33: cd: /wal-g: No such file or directory

do you see this in the build logs but it doesn't fail the whole build? or what?

veragini commented 1 year ago

Initially, after building the docker image, while testing this image version, I so, got into testing backup using wal-g, and I found that by simply running wal-g command, it was returning the following issue:

wal-g: error while loading shared libraries: libsodium.so.23: cannot open shared object file: No such file or directory

As soon as I spotted that, I started looking into wal-g as a package in the spilo image. So, I build it again and watched the building process where I could quickly see some warning, and an error while deploying wal-g.

If you build a new docker image, and simply run wal-g, you'll see the issue.

You can run a dummy test by copying dependencies.sh into a already built spilo image, and run it, and you will get into the point:

docker run --rm -it spilo:latest bash

It will install wal-g, and you will be able to get wal-g running without error (although there were some warnings, and an error as well) but it did allow me to simply run wal-g --version or wal-g as a vanilla run. So, it seems although there are a few changes I would think we need to make in the dependencies.sh as mentioned in the beginning of this issue, it will in fact resolve the following error:

error while loading shared libraries: libsodium.so.23: cannot open shared object file: No such file or directory

However, if you so build a fresh spilo image, having the dependencies.sh doing what is supposed to do, it will not resolve the above libsodium.so.23: cannot open shared object, which it indicates, there is probably a post step in the Dockerfile built which perhaps remove a given wal-g dependency.

I hope it is clear.

Thanks

hughcapet commented 1 year ago

Oh, I actually also have a problem with libsodium and wal-g build in general when I build Spilo using buildx for a platform different from my host one

podman buildx build --platform=linux/amd64 --build-arg PGVERSION=15 --build-arg PGOLDVERSIONS="" -t spilo .
+ bash link_libsodium.sh
...
make[3]: Entering directory '/wal-g/tmp/libsodium/src/libsodium'
  CC       crypto_shorthash/siphash24/libsodium_la-shorthash_siphashx24.lo
  CC       crypto_shorthash/siphash24/ref/libsodium_la-shorthash_siphashx24_ref.lo
  CC       crypto_sign/ed25519/ref10/libsodium_la-obsolete.lo
  CC       crypto_generichash/blake2b/ref/libssse3_la-blake2b-compress-ssse3.lo
  CC       crypto_pwhash/argon2/libssse3_la-argon2-fill-block-ssse3.lo
  CC       crypto_generichash/blake2b/ref/libsse41_la-blake2b-compress-sse41.lo
  CC       crypto_generichash/blake2b/ref/libavx2_la-blake2b-compress-avx2.lo
../../libtool: line 1752: 31478 Segmentation fault      (core dumped) gcc -DPACKAGE_NAME=\"libsodium\" -DPACKAGE_TARNAME=\"libsodium\" -DPACKAGE_VERSION=\"1.0.17\" "-DPACKAGE_STRING=\"libsodium 1.0.17\"" -DPACKAGE_BUGREPORT=\"https://github.com/jedisct1/libsodium/issues\" -DPACKAGE_URL=\"https://github.com/jedisct1/libsodium\" -DPACKAGE=\"libsodium\" -DVERSION=\"1.0.17\" -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -D__EXTENSIONS__=1 -D_ALL_SOURCE=1 -D_GNU_SOURCE=1 -D_POSIX_PTHREAD_SEMANTICS=1 -D_TANDEM_SOURCE=1 -DHAVE_C_VARARRAYS=1 -DHAVE_CATCHABLE_SEGV=1 -DHAVE_CATCHABLE_ABRT=1 -DTLS=_Thread_local -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_MMINTRIN_H=1 -DHAVE_EMMINTRIN_H=1 -DHAVE_PMMINTRIN_H=1 -DHAVE_TMMINTRIN_H=1 -DHAVE_SMMINTRIN_H=1 -DHAVE_AVXINTRIN_H=1 -DHAVE_AVX2INTRIN_H=1 -DHAVE_AVX512FINTRIN_H=1 -DHAVE_WMMINTRIN_H=1 -DHAVE_RDRAND=1 -DHAVE_SYS_MMAN_H=1 -DNATIVE_LITTLE_ENDIAN=1 -DHAVE_INLINE_ASM=1 -DHAVE_AMD64_ASM=1 -DHAVE_AVX_ASM=1 -DHAVE_TI_MODE=1 -DHAVE_CPUID=1 -DASM_HIDE_SYMBOL=.hidden -DHAVE_WEAK_SYMBOLS=1 -DCPU_UNALIGNED_ACCESS=1 -DHAVE_ATOMIC_OPS=1 -DHAVE_ALLOCA_H=1 -DHAVE_ALLOCA=1 -DHAVE_MMAP=1 -DHAVE_MLOCK=1 -DHAVE_MADVISE=1 -DHAVE_MPROTECT=1 -DHAVE_EXPLICIT_BZERO=1 -DHAVE_NANOSLEEP=1 -DHAVE_POSIX_MEMALIGN=1 -DHAVE_GETPID=1 -DCONFIGURED=1 -I. -I./include/sodium -I./include/sodium -msse2 -mssse3 -msse4.1 -g -O2 -pthread -fvisibility=hidden -fPIC -fno-strict-aliasing -fno-strict-overflow -fstack-protector -ftls-model=local-dynamic -MT crypto_generichash/blake2b/ref/libsse41_la-blake2b-compress-sse41.lo -MD -MP -MF crypto_generichash/blake2b/ref/.deps/libsse41_la-blake2b-compress-sse41.Tpo -c crypto_generichash/blake2b/ref/blake2b-compress-sse41.c -o crypto_generichash/blake2b/ref/.libs/libsse41_la-blake2b-compress-sse41.o
make[3]: *** [Makefile:2965: crypto_generichash/blake2b/ref/libsse41_la-blake2b-compress-sse41.lo] Error 1
make[3]: *** Waiting for unfinished jobs....
make[3]: Leaving directory '/wal-g/tmp/libsodium/src/libsodium'
make[2]: *** [Makefile:3097: all-recursive] Error 1
make[2]: Leaving directory '/wal-g/tmp/libsodium/src/libsodium'
make[1]: *** [Makefile:398: all-recursive] Error 1
make[1]: Leaving directory '/wal-g/tmp/libsodium/src'
make: *** [Makefile:514: all-recursive] Error 1
+ make pg_build
(cd main/pg && go build -mod vendor -tags "brotli libsodium lzo" -o wal-g -ldflags "-s -w -X github.com/wal-g/wal-g/cmd/pg.buildDate=`date -u +%Y.%m.%d_%H:%M:%S` -X github.com/wal-g/wal-g/cmd/pg.gitRevision=`git rev-parse --short HEAD` -X github.com/wal-g/wal-g/cmd/pg.walgVersion=`git tag -l --points-at HEAD`")
 google.golang.org/grpc/balancer/base: /usr/lib/go-1.20/pkg/tool/linux_amd64/compile: signal: segmentation fault (core dumped)
 # github.com/DataDog/zstd
gcc: internal compiler error: Segmentation fault signal terminated program cc1
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-11/README.Bugs> for instructions.
github.com/wal-g/wal-g/internal/webserver: /usr/lib/go-1.20/pkg/tool/linux_amd64/compile: signal: segmentation fault (core dumped)

If you build it with podman like me or on arm64 with target platform amd64, then maybe this is a setup specific problem (as our CI, for example, doesn't seem to have this problem when building Spilo for arm64 using buildx)