Segfault when using ffmpeg to decode MP3s

jjedele commented 4 years ago

I'm trying to use tfio.IOTensor.from_ffmpeg to build a simple data input pipeline reading MP3s, but it results in a segfault after processing a certain number of files. It did several tests to verify that it really is the number of files and not a corrupt file.

Code to reproduce:

import tensorflow as tf
import tensorflow_io as tfio

def ds_from_tsv(label_file, src_path):
    ds = tf.data.experimental.CsvDataset(
        label_file,
        [tf.string, tf.string],
        select_cols=[1, 2],
        field_delim="\t",
        use_quote_delim=False,
        header=True
    )

    ds = ds.map(lambda p, _: tf.strings.join([src_path, p], "/"))

    return ds

def read_mp3():
    # apparently ffmpeg only works in eager mode
    def ffmpeg_decode(path):
        print(path)

        ffmpeg_io = tfio.IOTensor.from_ffmpeg(path)
        audio_io = ffmpeg_io("a:0")
        audio_tensor = audio_io.to_tensor()
        audio_tensor = tf.squeeze(audio_tensor)
        return audio_tensor
    return lambda p: tf.py_function(ffmpeg_decode, [p], tf.int16)

commonvoice_root = "/home/ubuntu/commonvoice_de"

ds = ds_from_tsv(
    commonvoice_root + "/train.tsv",
    commonvoice_root + "/clips"
)

ds = ds.map(read_mp3())

n = 0
for r in ds:
    n += 1
    print(n, r.shape)

I'm working with the German part of the Mozilla CommonVoice dataset. For me the segfault happens after 1020 files. My machine has 20GB RAM, but the memory consumption of the process does look OK to me - not like a memory leak.

I'm running Ubuntu 18.04.3 LTS and ffmpeg 7:3.4.6-0ubuntu0.18.04.1.

yongtang commented 4 years ago

@jjedele Sorry for the late reply as debug with ffmpeg is a little challenging. Still looking into it.

In the meantime, if you are only looking for mp3 file, I added a mp3 decoder that is based on minimp3, which does not requires ffmpeg. It also works on Linux/macOS/Windows (ffmpeg only works on Ubuntu 16.04/18.04). You can give it a try with:

tfio.IOTensor.from_audio('audio.mp3')

Note from_audio actually process wav/flac/oggvorbis/mp3 implicitly.

See PR #801 for more details.

jjedele commented 4 years ago

@yongtang No worries, I can completely see how that's a pain. Thank you for looking into it.

That's awesome! Amusingly I started exactly the same project (MP3 read operator based on minimp3), but I'm not there yet since I neither worked with MP3 decoding nor with custom TF operators before. Will have a look at your code to get some inspiration ;)

yongtang commented 4 years ago

@jjedele nice to see interest in minimp3. I also created another PR #805 which is attempting to add mp4a support with minimp4 + AVFoundation on macOS. The plan is to use system-native APIs (e.g., Windows and macOS) when possible, and fall back to FFmpeg on Linux for very specific codec only.

tensorflow / io

Segfault when using ffmpeg to decode MP3s #758