pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.43k stars 636 forks source link

torchaudio.io.StreamReader crashes on repeated seek / next #3733

Open MichaelCurrie opened 5 months ago

MichaelCurrie commented 5 months ago

🐛 Describe the bug

If I repeatedly run the following script, it will fail before reaching "End" most of the time. About 75% of the time I run the script. Sometimes it crashes after the first seek, sometimes after the second, and sometimes it finishes completely.

I can overcome the issue by creating a new StreamReader instance every time I seek, but presumably this workaround will slow things down.

import requests
import os
import sys
import torchaudio
import code

print(torchaudio.__version__)

url = "https://assets.allsamplefiles.com/mp4/ns/60s/sample-file-quad-hd.mp4"
src = "D:\\sample-file-quad-hd.mp4"

if not os.path.isfile(src):
    print(f"Downloading video file {url}")
    response = requests.get(url)
    if response.status_code == 200:
        with open(src, "wb") as file:
            file.write(response.content)
    else:
        print("FAILED")
        sys.exit(0)
    print("Done downloading video file")

assert os.path.isfile(src)

stream_params = {
    "frames_per_chunk": 2,
    "buffer_chunk_size": -1,
    "stream_index": None,
    "decoder": "h264_cuvid",
    "decoder_option": {
        "deint": "2",
        "gpu": "0",
        "surfaces": "32",
    },
    "hw_accel": "cuda:0",
}

start_times = [0.0, 1.0677333333333334, 2.135466666666667]

stream_reader = torchaudio.io.StreamReader(src=src)
print("add_video_stream")
stream_reader.add_video_stream(**stream_params)

for i, start_time in enumerate(start_times):
    print(f"Before {i}")
    stream_reader.seek(start_time)
    chunk = next(stream_reader.stream())

print("End")

Note: the warning seems unrelated.

For example, here is some output:

(py311) C:\Users\Bank\Desktop>python torch_test.py
2.1.0+cu121
Downloading video file https://assets.allsamplefiles.com/mp4/ns/60s/sample-file-quad-hd.mp4
Done downloading video file
[W conversion.cpp:412] Warning: The output format NV12 is selected. This will be implicitly converted to YUV444P, in which all the color components Y, U, V have the same dimension. (function operator ())
Before 0
Before 1

Versions

Collecting environment information... PyTorch version: 2.1.2+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro GCC version: Could not collect Clang version: Could not collect CMake version: version 3.26.0-rc5 Libc version: N/A

Python version: 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.19045-SP0 Is CUDA available: True CUDA runtime version: 11.7.64 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2070 SUPER GPU 1: NVIDIA GeForce RTX 2070 SUPER

Nvidia driver version: 537.42 cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin\cudnn_ops_train64_8.dll HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture=9 CurrentClockSpeed=3696 DeviceID=CPU0 Family=179 L2CacheSize=10240 L2CacheSpeed= Manufacturer=GenuineIntel MaxClockSpeed=3696 Name=Intel(R) Core(TM) i9-10900X CPU @ 3.70GHz ProcessorType=3 Revision=21767

Versions of relevant libraries: [pip3] mypy-extensions==1.0.0 [pip3] numpy==1.26.1 [pip3] torch==2.1.2+cu121 [pip3] torchaudio==2.1.2+cu121 [pip3] torchvision==0.16.2+cu121 [conda] cudatoolkit 11.1.1 hb074779_12 conda-forge [conda] numpy 1.23.5 pypi_0 pypi

MichaelCurrie commented 5 months ago

I'm running cuda 11.7, so probably that's why it's crashing while using a torchaudio 2.1.2+cu121

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_19:00:59_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

Unfortunately NO - even when I upgrade cuda to 12.1, it is still an issue. To work around the issue, either:

  1. you have to make a new stream_reader instance EVERY time you seek, OR

  2. Downgrade to cuda 11.7 and then downgrade torch to the last compatible 11.7 version:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
conda install -c conda-forge ffmpeg=4.3.1