xiph / opus

Modern audio compression for the internet.
https://opus-codec.org/
Other
2.34k stars 621 forks source link

`glibc`-build is encoding audio 2-3 times faster than `musl-libc` static build #302

Open vadimkantorov opened 1 year ago

vadimkantorov commented 1 year ago

I'm experimenting with portable binaries for encoding. For that I built a static version opusenc with musl-gcc. The problem is that for encoding the same 3-hour file, the glibc-version is 2-3 times faster. Would you have any advice why this happens? Any advice on obtaining the fastest-encoding build?

It also seems that the gcc version matters too, I also enclose opusenc_with_cc_on_different_machine version which always performs 15-45s faster than opusenc_with_cc, it is produced by the same build script but compiled on centos7 with old gcc.

opusenc.zip

opusenc_with_cc.asm.txt opusenc_with_muslc.asm.txt opusenc_with_cc_on_different_machine.asm.txt

Here is the test.wav: https://1drv.ms/u/s!Apx8USiTtrYmq9NafdDPxW0Gs78iKw?e=ECgVwb

opusenc_with_cc was copied from prefix_cc_0.1.10_1.3.1_0.11/bin/opusenc, opusenc_with_muslc was copied from prefix_muslc_0.1.10_1.3.1_0.11/bin/opusenc produced by the script below

Thanks a lot!


if [ "$1" = "cc" ]; then
    export SUFFIX=cc
else
    export CFLAGS="-U_FORTIFY_SOURCE" 
    export LDFLAGS="--static -static -static-libgcc -lm -lc" 
    export CC=musl-gcc
    export SUFFIX=muslc
fi

wget -nc \
  https://archive.mozilla.org/pub/opus/libopusenc-0.2.1.tar.gz \
  https://downloads.xiph.org/releases/ogg/libogg-1.3.5.tar.gz \
  https://downloads.xiph.org/releases/opus/opusfile-0.11.tar.gz \
  https://downloads.xiph.org/releases/opus/opus-1.3.1.tar.gz \
  https://downloads.xiph.org/releases/opus/opus-tools-0.1.10.tar.gz
find -name '*.tar.gz' -exec tar -xf {} \;

export PREFIX="$(pwd)/prefix_${SUFFIX}_98f3ddc8e94b8be31ebdeac52805a93cfab395e7_1.3.1_0.11"
cd libogg-1.3.5
sh ./configure --prefix="$PREFIX" --disable-shared --enable-static
make install
cd ../opus-1.3.1
sh ./configure --prefix="$PREFIX" --disable-shared --enable-static --disable-maintainer-mode --disable-doc --disable-extra-programs
make install
cd ../opusfile-0.11
DEPS_LIBS="-L$PREFIX/lib -lopus -logg" DEPS_CFLAGS="-I$PREFIX/include -I$PREFIX/include/opus -I$PREFIX/include/ogg" sh ./configure --prefix="$PREFIX" --disable-shared --enable-static --disable-maintainer-mode --disable-examples --disable-doc --disable-http
make install
cd ../libopusenc-0.2.1
DEPS_LIBS="-L$PREFIX/lib -lopus" DEPS_CFLAGS="-I$PREFIX/include -I$PREFIX/include/opus" sh ./configure --prefix="$PREFIX" --disable-shared --enable-static --disable-maintainer-mode --disable-examples --disable-doc
make install
cd ../opus-tools-98f3ddc8e94b8be31ebdeac52805a93cfab395e7
sh ./autogen.sh
OGG_LIBS="-L$PREFIX/lib -logg" OGG_CFLAGS="-I$PREFIX/include -I$PREFIX/include/ogg" OPUS_LIBS="-L$PREFIX/lib -lopus -lm" OPUS_CFLAGS="-I$PREFIX/include/opus" Opus_LIBS="-L$PREFIX/lib -lopus -lm" Opus_CFLAGS="-I$PREFIX/include/opus" OPUSFILE_LIBS="-L$PREFIX/lib -lopusfile" OPUSFILE_CFLAGS="-I$PREFIX/include/opus" OPUSURL_LIBS="-L$PREFIX/lib -lopusurl" OPUSURL_CFLAGS="-I$PREFIX/include/opus" LIBOPUSENC_LIBS="-L$PREFIX/lib -lopusenc" LIBOPUSENC_CFLAGS="-I$PREFIX/include/opus" sh ./configure --prefix="$PREFIX" --with-ogg="$PREFIX" --without-flac --disable-share
d --enable-static  --disable-oggtest
make install
cd ..
gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
vadimkantorov@delldevadim:/mnt/c/Users/vadim/opusenczoo$ c65621723a53b74862db2d7d4c2c69c0/prefix_muslc_0.1.10_1.3.1_0.11/bin/opusenc --bitrate 32 test.wav test.opus
Skipping chunk of type "LIST", length 26
Encoding using libopus 1.3.1 (audio)
-----------------------------------------------------
   Input: 48kHz 2 channels
  Output: 2 channels (2 coupled)
          20ms packets, 32kbit/sec VBR
 Preskip: 312

Encoding complete
-----------------------------------------------------
       Encoded: 2 hours, 58 minutes, and 10.04 seconds
       Runtime: 9 minutes and 25 seconds
                (18.92x realtime)
         Wrote: 42449564 bytes, 534502 packets, 10693 pages
       Bitrate: 31.151kbit/s (without overhead)
 Instant rates: 1.2kbit/s to 69.2kbit/s
                (3 to 173 bytes per packet)
      Overhead: 1.94% (container+metadata)

vadimkantorov@delldevadim:/mnt/c/Users/vadim/opusenczoo$ c65621723a53b74862db2d7d4c2c69c0/prefix_cc_0.1.10_1.3.1_0.11/bin/opusenc --bitrate 32 test.wav test.opus
Skipping chunk of type "LIST", length 26
Encoding using libopus 1.3.1 (audio)
-----------------------------------------------------
   Input: 48kHz 2 channels
  Output: 2 channels (2 coupled)
          20ms packets, 32kbit/sec VBR
 Preskip: 312

Encoding complete
-----------------------------------------------------
       Encoded: 2 hours, 58 minutes, and 10.04 seconds
       Runtime: 3 minutes and 14 seconds
                (55.1x realtime)
         Wrote: 42449564 bytes, 534502 packets, 10693 pages
       Bitrate: 31.1509kbit/s (without overhead)
 Instant rates: 1.2kbit/s to 69.2kbit/s
                (3 to 173 bytes per packet)
      Overhead: 1.94% (container+metadata)

$ opusenc_with_cc_on_different_machine  --bitrate 32 test.wav test.opus
Skipping chunk of type "LIST", length 26
Encoding using libopus 1.3.1 (audio)
-----------------------------------------------------
   Input: 48kHz 2 channels
  Output: 2 channels (2 coupled)
          20ms packets, 32kbit/sec VBR
 Preskip: 312

Encoding complete
-----------------------------------------------------
       Encoded: 2 hours, 58 minutes, and 10.04 seconds
       Runtime: 3 minutes and 1 seconds
                (59.06x realtime)
         Wrote: 42449564 bytes, 534502 packets, 10693 pages
       Bitrate: 31.1509kbit/s (without overhead)
 Instant rates: 1.2kbit/s to 69.2kbit/s
                (3 to 173 bytes per packet)
      Overhead: 1.94% (container+metadata)

Counting the SSE instructions in two versions:

awk '/[ \t](addps|addss|andnps|andps|cmpps|cmpss|comiss|cvtpi2ps|cvtps2pi|cvtsi2ss|cvtss2s|cvttps2pi|cvttss2si|divps|divss|ldmxcsr|maxps|maxss|minps|minss|movaps|movhlps|movhps|movlhps|movlps|movmskps|movntps|movss|movups|mulps|mulss|orps|rcpps|rcpss|rsqrtps|rsqrtss|shufps|sqrtps|sqrtss|stmxcsr|subps|subss|ucomiss|unpckhps|unpcklps|xorps|pavgb|pavgw|pextrw|pinsrw|pmaxsw|pmaxub|pminsw|pminub|pmovmskb|psadbw|pshufw)[ \t]/' opusenc_with_cc.asm.txt | wc -l
# 6565

awk '/[ \t](addps|addss|andnps|andps|cmpps|cmpss|comiss|cvtpi2ps|cvtps2pi|cvtsi2ss|cvtss2s|cvttps2pi|cvttss2si|divps|divss|ldmxcsr|maxps|maxss|minps|minss|movaps|movhlps|movhps|movlhps|movlps|movmskps|movntps|movss|movups|mulps|mulss|orps|rcpps|rcpss|rsqrtps|rsqrtss|shufps|sqrtps|sqrtss|stmxcsr|subps|subss|ucomiss|unpckhps|unpcklps|xorps|pavgb|pavgw|pextrw|pinsrw|pmaxsw|pmaxub|pminsw|pminub|pmovmskb|psadbw|pshufw)[ \t]/' opusenc_with_muslc.asm.txt | wc -l
# 9002
vadimkantorov commented 12 months ago

Maybe per-frame file read/write syscalls are the reason (maybe due to buffering/fsync-calls differences between glibc and musl-libc)?