ultravideo / kvazaar

An open-source HEVC encoder
BSD 3-Clause "New" or "Revised" License
821 stars 178 forks source link

Segmentation fault on alpine linux #410

Closed AlexanderSchuetz97 closed 1 month ago

AlexanderSchuetz97 commented 1 month ago

When compiling kvazaar on the alpine:latest (also happens with alpine linux v3.10 from 2020, exact same issue) docker image using the recommended settings:

./autogen.sh
./configure
make

then running the resulting kvazaar binary "./kvazaar -i /tmp/in.yuf --input-res=400x120 -o /tmp/out.hevc" causes a segmentation fault

Doing the same on debian bookworm causes no segmentation fault and the output file is correct. I can reproduce this with any input file. I have installed the following apk packages in the container:

apk add nasm git alpine-sdk bash cmake ninja meson perl autoconf automake libtool gperf xz libogg yasm diffutils gawk linux-headers asciidoc doxygen xmlto gtest clang

other than that, the container is blank.

Ive tried compiling with CC=clang too, same result. Is there perhaps an issue with libc-musl?

EDIT: The musl-gcc cross compiler on debian also produces a binary that segmentation faults at the same position. This appears to be a incompatability with libc-musl. Any idea why that is? This effectively prevents me from using ffmpeg with libkvazaar on alpine linux.

AlexanderSchuetz97 commented 1 month ago

This is the output from GDB from a gcc build

bash-5.0# gdb --args kvazaar -i /tmp/out.yuf -o /tmp/out.hevc --input-res=400x120
GNU gdb (GDB) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from kvazaar...
(gdb) cont
The program is not being run.
(gdb) run
Starting program: /usr/local/bin/kvazaar -i /tmp/out.yuf -o /tmp/out.hevc --input-res=400x120
warning: Error disabling address space randomization: Operation not permitted
Compiled: INTEL, flags: MMX SSE SSE2
Detected: INTEL, flags: MMX SSE SSE2 SSE3 SSSE3 SSE41 SSE42 AVX AVX2
Available: avx2(55) sse2(2) sse41(4) 
In use: avx2(55) sse2(1) 
--owf=auto value set to 4.
--threads=auto value set to 4.
[New LWP 6823]
[New LWP 6824]
[New LWP 6825]
[New LWP 6826]
Input: /tmp/out.yuf, output: /tmp/out.hevc
  Video size: 400x120 (input=400x120)
[New LWP 6827]

Thread 2 "kvazaar" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 6823]
0x00007f4aed8034e1 in memcpy () from /lib/ld-musl-x86_64.so.1
(gdb) bt
#0  0x00007f4aed8034e1 in memcpy () from /lib/ld-musl-x86_64.so.1
#1  0x00007f4aed74425b in kvz_search_lcu (state=state@entry=0x556814a2de20, x=<optimized out>, y=<optimized out>, hor_buf=<optimized out>, ver_buf=<optimized out>) at search.c:1202
#2  0x00007f4aed7206a4 in encoder_state_worker_encode_lcu (opaque=0x556814a2e880) at encoderstate.c:659
#3  0x00007f4aed74daf0 in threadqueue_worker (threadqueue_opaque=0x7f4aed845da0) at threadqueue.c:300
#4  0x00007f4aed804f82 in ?? () from /lib/ld-musl-x86_64.so.1
#5  0x00007f4aed6ed000 in ?? ()
#6  0x00007f4aed80704d in ?? () from /lib/ld-musl-x86_64.so.1
#7  0x00007f4aed74da30 in ?? () at threadqueue.c:531 from /usr/local/lib/libkvazaar.so.7
#8  0x00007f4aed845da0 in ?? ()
#9  0x0000000000000000 in ?? ()
AlexanderSchuetz97 commented 1 month ago

This appears to be some sort of multi-threading issue? When forcefully disabling multithreaded encoding the segmentation fault does not appear. I have modified encoderstate.c locally like this:

bool use_parallel_encoding = (wavefront && state->parent->children[1].encoder_control);
use_parallel_encoding = false;
if (!use_parallel_encoding) {

This will result in a binary with musl that does not segfault

AlexanderSchuetz97 commented 1 month ago

Ive found the issue, the program overruns the stack in search.c line 1199.

lcu_t work_tree[MAX_PU_DEPTH + 1];

for some reason the musl libc allocates a smaller stack than glibc for different threads. I will create a pr that mallocs this allocation.

EDIT: While this fixes this SEGFAULT, the program just explodes at the next possible location. You have large stack variables in multiple locations. I do not have the time to debug, find and move them all to the heap. just fyi. The stack is between 80k-128k on musl libc. Its about 2-10MB for glibc. Your threadwrapper implementation sadly abstracts the call to increase the stack size at runtime away. I have managed to compile kvazaar successfull using export LDFLAGS=-Wl,-z,stack-size=10485760 before running ./configure I recommend adding this to the README.md if you do not have the time to find and fix those large stack allocations.

Jovasa commented 1 month ago

Thanks for reportting this, I will discuss with Fador next week when he is back from his holiday. I'd say that it is unlikely that we'll move the large stack allocations to heap, because it would bring a noticeable encoding speed degradation.

Your threadwrapper implementation sadly abstracts the call to increase the stack size at runtime away.

The threadwrapper is only used in Windows, so you should be able to use the full pthreads API in any flavour of linux.

AlexanderSchuetz97 commented 1 month ago

In that case you should/could call pthread_attr_setstacksize with enough stacksize so that all your allocations fit. I think this can be safely called for all libc's even normal glibc. I can confirm that 10mb is enough. But since this is a linker flag I am using its now reserving 10mb for each thread in all of ffmpeg. Id prefer it if kvazaar only did this for its threads and with a reasonable size.

Jovasa commented 1 month ago

1mb should be enough, because that's the default in Windows, and we are most likely going to set it at that.

Jovasa commented 1 month ago

Should be fixed with 1eeccb3c98da95601cff74849d24a14e98f4c56b

AlexanderSchuetz97 commented 1 month ago

Will provide feedback tomorrow

AlexanderSchuetz97 commented 1 month ago

Problem is fixed. Works as expected in standalone binary and inside ffmpeg.