torchaudio.load segfaults in nightly release

Cupcee commented 1 year ago

🐛 Describe the bug

Bug description

Nightly release of torchaudio (with Cuda 12.1) segfaults when torchaudio.load is called with (any?) .wav file. I tried with several different ones. Reading these files works as expected with e.g. scipy.io.wavfile.read or librosa.read in the same environment (see environment below I tested), so I do not think file can be the issue. I also tested reading the file with the pytorch docker image pytorch/pytorch and that works. Only this nightly release does not work.

Minimal reproducible example (taken from a more complex Dockerfile I'm running)

# run nvidia docker image with cuda 12.1.1
docker run --rm -it nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04 bash

# (we are in container bash now)
# install some packages
apt-get update && apt-get install -y --no-install-recommends \
        g++ \
        make \
        automake \
        autoconf \
        bzip2 \
        unzip \
        wget \
        sox \
        libtool \
        git \
        subversion \
        python2.7 \
        python3 \
        python3-pip \
        python3-dev \
        python3-distutils \
        zlib1g-dev \
        gfortran \
        ca-certificates \
        patch \
        ffmpeg \
        vim && rm -rf /var/lib/apt/lists/*

# install pytorch
pip3 install \
--no-cache-dir \
--pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

# test torchaudio.load
python3
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchaudio
>>> torchaudio.load("myfile.wav")
Segmentation fault (core dumped)

Versions

Collecting environment information... PyTorch version: 2.1.0.dev20230807+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.2 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.35

Python version: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] (64-bit runtime) Python platform: Linux-5.19.0-50-generic-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: 12.1.105 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA A2 Nvidia driver version: 530.30.02 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz CPU family: 6 Model: 106 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 8 Stepping: 6 BogoMIPS: 5187.81 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq dtes64 vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid fsrm md_clear arch_capabilities Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full L1d cache: 256 KiB (8 instances) L1i cache: 256 KiB (8 instances) L2 cache: 32 MiB (8 instances) L3 cache: 128 MiB (8 instances) NUMA node(s): 1 NUMA node0 CPU(s): 0-7 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Vulnerability Retbleed: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Mitigation; TSX disabled

Versions of relevant libraries: [pip3] numpy==1.24.1 [pip3] pytorch-triton==2.1.0+e6216047b8 [pip3] torch==2.1.0.dev20230807+cu121 [pip3] torchaudio==2.1.0.dev20230807+cu121 [pip3] torchvision==0.16.0.dev20230807+cu121 [conda] Could not collect

mthrok commented 1 year ago

Hi @Cupcee

What's the ffmpeg version that is installed? I suspect it's FFmpeg4, but can you try FFmepg5?

Cupcee commented 1 year ago

Hi @Cupcee

What's the ffmpeg version that is installed? I suspect it's FFmpeg4, but can you try FFmepg5?

I tried installing ffmpeg5 on the Cuda 12.1 Docker image as instructed:

  docker run --rm -it nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04 bash
  # (in container bash)
  sudo add-apt-repository ppa:savoury1/ffmpeg4
  sudo add-apt-repository ppa:savoury1/ffmpeg5
  sudo apt-get update
  sudo apt-get upgrade && sudo apt-get dist-upgrade
  sudo apt-get install ffmpeg

However this does not seem to be possible with this image:

The following packages have unmet dependencies:
 ffmpeg : Depends: libavcodec59 (= 7:5.1.3-0ubuntu1~22.04.sav2)
          Depends: libavfilter8 (= 7:5.1.3-0ubuntu1~22.04.sav2)
          Depends: libavformat59 (= 7:5.1.3-0ubuntu1~22.04.sav2)
 libavdevice59 : Depends: libavcodec59 (= 7:5.1.3-0ubuntu1~22.04.sav2)
                 Depends: libavfilter8 (= 7:5.1.3-0ubuntu1~22.04.sav2)
                 Depends: libavformat59 (= 7:5.1.3-0ubuntu1~22.04.sav2)
E: Unable to correct problems, you have held broken packages.

So I take it that torchaudio nightly does not support ffmpeg4 anymore then?

mthrok commented 1 year ago

So I take it that torchaudio nightly does not support ffmpeg4 anymore then?

It does. The nightly build expands the support of FFmpeg version from 4.1 to 6.0. But in some distribution, the segfault happens. The code is not changed, so it's something else.

What would happen if you set TORCHAUDIO_USE_FFMPEG_VERSION=4 before importing torchaudio?

Cupcee commented 1 year ago

What would happen if you set TORCHAUDIO_USE_FFMPEG_VERSION=4 before importing torchaudio?

Still segfaults (I did the same steps as in my minimal reproducible example except replaced python3 with TORCHAUDIO_USE_FFMPEG_VERSION=4 python3)

root@a59ad6964ade:/# TORCHAUDIO_USE_FFMPEG_VERSION=4 python3
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.getenv("TORCHAUDIO_USE_FFMPEG_VERSION")
'4'
>>> import torchaudio
>>> torchaudio.load("myfile.wav")
Segmentation fault (core dumped)

mthrok commented 1 year ago

Stack trace from gdb.

#0  0x00007fffd9f16da4 in ff_framequeue_add (fq=fq@entry=0x55555a213b30, frame=0x55555a14cd80) at libavfilter/framequeue.c:90
#1  0x00007fffd9ef010d in ff_filter_frame (link=0x55555a213a40, frame=<optimized out>) at libavfilter/avfilter.c:1134
#2  0x00007fffd9ef5bf3 in av_buffersrc_add_frame_flags (ctx=0x55555a21f1c0, frame=0x555559f49700, flags=8) at libavfilter/buffersrc.c:220
#3  0x00007fffdba8b67d in torchaudio::io::detail::(anonymous namespace)::ProcessImpl<torchaudio::io::AudioConverter<(c10::ScalarType)6, true>, torchaudio::io::detail::UnchunkedBuffer>::process_frame(AVFrame*) () from /code/audio/torchaudio/lib/libtorchaudio_ffmpeg4.so
#4  0x00007fffdba93d35 in torchaudio::io::StreamProcessor::send_frame(AVFrame*) () from /code/audio/torchaudio/lib/libtorchaudio_ffmpeg4.so
#5  0x00007fffdba93dc9 in torchaudio::io::StreamProcessor::process_packet(AVPacket*) () from /code/audio/torchaudio/lib/libtorchaudio_ffmpeg4.so
#6  0x00007fffdba97342 in torchaudio::io::StreamReader::process_packet() () from /code/audio/torchaudio/lib/libtorchaudio_ffmpeg4.so
#7  0x00007fffdba97538 in torchaudio::io::StreamReader::process_all_packets() () from /code/audio/torchaudio/lib/libtorchaudio_ffmpeg4.so
#8  0x00007fffdbab5be8 in torchaudio::io::(anonymous namespace)::_load_audio(torchaudio::io::StreamReader&, int, c10::optional<std::string> const&, bool const&) () from /code/audio/torchaudio/lib/libtorchaudio_ffmpeg4.so
#9  0x00007fffdbab6568 in torchaudio::io::(anonymous namespace)::load(std::string const&, c10::optional<std::string> const&, c10::optional<std::string> const&, bool const&) () from /code/audio/torchaudio/lib/libtorchaudio_ffmpeg4.so
#10 0x00007fffdbab8f3d in c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<std::tuple<at::Tensor, long> (*)(std::string const&, c10::optional<std::string> const&, c10::optional<std::string> const&, bool const&), std::tuple<at::Tensor, long>, c10::guts::typelist::typelist<std::string const&, c10::optional<std::string> const&, c10::optional<std::string> const&, bool const&> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) ()
   from /code/audio/torchaudio/lib/libtorchaudio_ffmpeg4.so
#11 0x00007ffff6d03142 in c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const ()
   from /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so
#12 0x00007ffff6a9cb93 in torch::jit::invokeOperatorFromPython(std::vector<std::shared_ptr<torch::jit::Operator>, std::allocator<std::shared_ptr<torch::jit::Operator> > > const&, pybind11::args, pybind11::kwargs const&, c10::optional<c10::DispatchKey>) ()
   from /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so
#13 0x00007ffff6a9d488 in torch::jit::_get_operation_for_overload_or_packet(std::vector<std::shared_ptr<torch::jit::Operator>, std::allocator<std::shared_ptr<torch::jit::Operator> > > const&, c10::Symbol, pybind11::args, pybind11::kwargs const&, bool, c10::optional<c10::DispatchKey>) ()
   from /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so
#14 0x00007ffff6981e00 in pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#194}::operator()(std::string const&) const::{lambda(pybind11::args, pybind11::kwargs)#1}, pybind11::object, pybind11::args, pybind11::kwargs, pybind11::name, pybind11::doc>(torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#194}::operator()(std::string const&) const::{lambda(pybind11::args, pybind11::kwargs)#1}&&, pybind11::object (*)(pybind11::args, pybind11::kwargs), pybind11::name const&, pybind11::doc const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) () from /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so
#15 0x00007ffff6588844 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) ()
   from /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so
#16 0x00005555556b3e0e in ?? ()
#17 0x00005555556c312b in PyObject_Call ()
#18 0x000055555569f2c1 in _PyEval_EvalFrameDefault ()
#19 0x00005555556a9784 in _PyObject_FastCallDictTstate ()
#20 0x00005555556bf54c in _PyObject_Call_Prepend ()
#21 0x00005555557d81e0 in ?? ()
#22 0x00005555556aa5eb in _PyObject_MakeTpCall ()
--Type <RET> for more, q to quit, c to continue without paging--
#23 0x00005555556a31f1 in _PyEval_EvalFrameDefault ()
#24 0x00005555556b470c in _PyFunction_Vectorcall ()
#25 0x000055555569ce0d in _PyEval_EvalFrameDefault ()
#26 0x00005555556b470c in _PyFunction_Vectorcall ()
#27 0x00005555556a28a2 in _PyEval_EvalFrameDefault ()
#28 0x00005555556b470c in _PyFunction_Vectorcall ()
#29 0x00005555556a28a2 in _PyEval_EvalFrameDefault ()
#30 0x000055555578de56 in ?? ()
#31 0x000055555578dcf6 in PyEval_EvalCode ()
#32 0x00005555557b87d8 in ?? ()
#33 0x00005555557b20bb in ?? ()
#34 0x00005555557b8525 in ?? ()
#35 0x00005555557b7a08 in _PyRun_SimpleFileObject ()
#36 0x00005555557b7653 in _PyRun_AnyFileObject ()
#37 0x00005555557aa41e in Py_RunMain ()
#38 0x0000555555780cad in Py_BytesMain ()
#39 0x00007ffff7c81d90 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#40 0x00007ffff7c81e40 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#41 0x0000555555780ba5 in _start ()

The segmentation fault happens at

https://github.com/FFmpeg/FFmpeg/blob/d61977cbe453869cec28d32b71fe25c2cd965dcf/libavfilter/framequeue.c#L90C1-L91C1

called from

The variable b is likely uninitialized.

mthrok commented 1 year ago

It seems that FFFrameQueue object at AVFilterContext->outputs[0]->fifo is passed to bucket function which is likely returning null reference.

fifo is not accessible from outside. https://ffmpeg.org/doxygen/4.4/avfilter_8h_source.html#l00613

mthrok commented 1 year ago

FFmpeg5 works fine. I compiled n5.0 tag under the same condition and it does not segfault.

mthrok commented 1 year ago

Using single-version integration, linking Torchaudio against the same ffmpeg libraries and the segmentation fault does not happen. So this can be something specific about the way the new Torchaudio build/integration/runtime link.

mthrok commented 1 year ago

I landed #3561, and nightly tomorrow should work.

mthrok commented 1 year ago

I tested the latest nightly on Colab and the issue seems to be resolved. Therefore closing. Thanks for the report. Feel free to open an issue if there is something new.

pytorch / audio