nschlia / ffmpegfs

FUSE-based transcoding filesystem with video support from many formats to FLAC, MP4, TS, WebM, OGG, MP3, HLS, and others.
https://nschlia.github.io/ffmpegfs/
GNU General Public License v3.0
198 stars 14 forks source link

GPF and core dump if not enough disk space to transcode #24

Closed nschlia closed 5 years ago

nschlia commented 5 years ago

It seems browsing in Explorer is now much quicker, I did run into segfaults almost immediately. Trying to open a folder containing a feature length mkv using Resolve file had the following result (with Resolve hanging for a long time before the segfault):

 # ffmpegfs -f --win_smb_fix=1 --log_stderr --max_inactive_suspend=1h --max_inactive_abort=24h --desttype=prores+aiff /mnt/datasets/lz4 /mnt/virtual/ffmpegfs -o allow_other,ro
2019-01-11 23:11:50 INFO   : FFMPEGFS V1.6 initialising.
2019-01-11 23:11:50 INFO   : Mapping '/mnt/datasets/lz4/' to '/mnt/virtual/ffmpegfs/'.
2019-01-11 23:11:50 INFO   : Process with PID 21775 is now client, master is PID 12940.
2019-01-11 23:12:45 INFO   : [/mnt/datasets/lz4/Movies/bluray/bluray.mkv] Transcoding to prores. Predicted size 137378311110 bytes.
2019-01-11 23:12:45 INFO   : [/mnt/datasets/lz4/Movies/bluray/bluray.mkv] Opening input file.
2019-01-11 23:12:45 INFO   : [/mnt/datasets/lz4/Movies/bluray/bluray.mkv] Video in: h264@12.13 Mbps [unset]
2019-01-11 23:12:45 INFO   : [/mnt/datasets/lz4/Movies/bluray/bluray.mkv] Audio in: dts@1.54 Mbps 6 Channels 48.000 kHz [unset]
2019-01-11 23:12:45 INFO   : [/mnt/virtual/ffmpegfs/bluray.mov] Opening output file.
2019-01-11 23:12:45 INFO   : [/mnt/virtual/ffmpegfs/bluray.mov] Audio out: pcm_s16le@1.41 Mbps 2 Channels 44.100 kHz [01:43:10.4]
2019-01-11 23:12:45 INFO   : [/mnt/virtual/ffmpegfs/bluray.mov] Video out: prores@2.10 Mbps [01:43:10.4]
2019-01-11 23:12:46 INFO   : [/mnt/virtual/ffmpegfs/bluray.mov] Creating audio resampler: fltp -> s16 / 48.000 kHz -> 44.100 kHz / 5.1(side) -> stereo.
2019-01-11 23:12:46 WARNING: [/mnt/virtual/ffmpegfs/bluray.mov] Non-monotonous DTS in video output stream; previous: 0, current: 0; changing to 1. This may result in incorrect timestamps in the output.
2019-01-11 23:16:54 INFO   : [/mnt/virtual/ffmpegfs/bluray.mov] Client has gone away.
2019-01-11 23:16:54 ERROR  : [/tmp/ffmpegfs//mnt/virtual/ffmpegfs//mnt/datasets/lz4/Movies/bluray/bluray.mkv.cache.prores] Error calling ftruncate() to resize the file: Bad file descriptor
Segmentation fault
nschlia commented 5 years ago

gdb check of the core shows:

(gdb) bt
#0  0x00007fe8fdc5372a in ?? () from /lib64/libc.so.6
#1  0x000055c9c0f406dc in memcpy (__len=900653, __src=0x7fe8e41054f0, __dest=<optimized out>) at /usr/include/bits/string_fortified.h:34
#2  Buffer::write (this=0x7fe8ec001d40, data=0x7fe8e41054f0 "", length=900653) at buffer.cc:312
#3  0x000055c9c0f578d1 in FFMPEG_Transcoder::output_write (opaque=<optimized out>, data=<optimized out>, size=<optimized out>) at ffmpeg_transcoder.cc:3733
#4  0x00007fe8ff915fef in ?? () from /usr/lib64/libavformat.so.57
#5  0x00007fe8f8acda60 in ?? ()
#6  0x00007fe8e40cb9b0 in ?? ()
#7  0x00007fe8f8acda60 in ?? ()
#8  0x0000000000bd218a in ?? ()
#9  0x0000000000bd218a in ?? ()
#10 0x0000000000000000 in ?? ()
(gdb) 2
Undefined command: "2".  Try "help".
(gdb) f 2
#2  Buffer::write (this=0x7fe8ec001d40, data=0x7fe8e41054f0 "", length=900653) at buffer.cc:312
312             memcpy(write_ptr, data, length);
(gdb) l
307         {
308             length = 0;
309         }
310         else
311         {
312             memcpy(write_ptr, data, length);
313             increment_pos(length);
314         }
315
316         return length;
(gdb) 

This is directly after write_prepare that prints "Error calling ftruncate() to resize the file: Bad file descriptor" which indicates that the FFMPEG_transcode object has become invalid. This means the decoder thread is still running, which is OK, but it is weird because the object will only be destroyed after the decode loop exited and that should have created some log entries (and of course transcoding should not continue in that case).

Probably someone trashed the memory...

nschlia commented 5 years ago

Because on my Funtoo gdb did not show detailed output, here directly from the original machine:

 ~/ffmpegfs # gdb -c core /usr/local/bin/ffmpegfs
GNU gdb (Gentoo 8.2 p1) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/bin/ffmpegfs...done.
[New LWP 1241]
[New LWP 1242]
[New LWP 31932]
[New LWP 31935]
[New LWP 32100]
[New LWP 1243]
[New LWP 31933]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `ffmpegfs -f --win_smb_fix=1 --log_stderr --max_inactive_suspend=1h --max_inacti'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fe8fdc5372a in ?? () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7fe8f8ace700 (LWP 1241))]
(gdb) bt
#0  0x00007fe8fdc5372a in ?? () from /lib64/libc.so.6
#1  0x000055c9c0f406dc in memcpy (__len=900653, __src=0x7fe8e41054f0, __dest=<optimized out>) at /usr/include/bits/string_fortified.h:34
#2  Buffer::write (this=0x7fe8ec001d40, data=0x7fe8e41054f0 "", length=900653) at buffer.cc:312
#3  0x000055c9c0f578d1 in FFMPEG_Transcoder::output_write (opaque=<optimized out>, data=<optimized out>, size=<optimized out>)
    at ffmpeg_transcoder.cc:3733
#4  0x00007fe8ff915fef in ?? () from /usr/lib64/libavformat.so.57
#5  0x00007fe8ff916060 in ?? () from /usr/lib64/libavformat.so.57
#6  0x00007fe8ff917415 in avio_flush () from /usr/lib64/libavformat.so.57
#7  0x00007fe8ff993451 in ?? () from /usr/lib64/libavformat.so.57
#8  0x00007fe8ff993c1d in ?? () from /usr/lib64/libavformat.so.57
#9  0x00007fe8ff99543f in ?? () from /usr/lib64/libavformat.so.57
#10 0x00007fe8ff99572b in ?? () from /usr/lib64/libavformat.so.57
#11 0x00007fe8ff9ac862 in ?? () from /usr/lib64/libavformat.so.57
#12 0x00007fe8ff9aedc2 in av_interleaved_write_frame () from /usr/lib64/libavformat.so.57
#13 0x000055c9c0f6339f in FFMPEG_Transcoder::encode_video_frame (this=this@entry=0x7fe8e4000b20, frame=<optimized out>,
    data_present=data_present@entry=0x7fe8f8acdc94) at ffmpeg_transcoder.cc:3000
#14 0x000055c9c0f63a85 in FFMPEG_Transcoder::process_single_fr (this=this@entry=0x7fe8e4000b20, status=@0x7fe8f8acdd74: 0)
    at ffmpeg_transcoder.cc:3343
#15 0x000055c9c0f2b179 in decoder_thread (arg=0x7fe8f40d9e00) at transcode.cc:685
#16 0x00007fe8fdf7c85a in start_thread () from /lib64/libpthread.so.0
#17 0x00007fe8fdcaee2f in clone () from /lib64/libc.so.6
nschlia commented 5 years ago

The decoder thread is still running, which is totally OK as a long time out is set, that should be sufficient to transcode most files to cache. I need to find out who pulls the FFMPEG_transcode object away.

nschlia commented 5 years ago

That was actually a disk space problem. When the disk space was not sufficient to store the (predicted) transcoded file size, transcoding would be aborted. But this was not reported, and the file was attempted to be closed twice. The second close caused the crash.

I fixed that by checking if the file is not open first and not closing it again. This fixes the crash. I also added proper error reporting. The integer overflow I suspected was only a wrong printf format string, %u instead of %zu for size_t.

Now transcoding will abort e.g. a 290 GB file if there's only 40 GB disk space left on the cache drive.

So this was not a Funtoo problem, I only found out because I tried that on my laptop which is a bit akward in disk space. My other test machine happily provided the 300 Gigs so I was desperately seeking a crash that would not occur :)