pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.43k stars 635 forks source link

StreamRead failing when Reading RTSP stream with CPU #3798

Open pedromoraesh opened 1 month ago

pedromoraesh commented 1 month ago

🐛 Describe the bug

image

import torch
import torchaudio

print(torch.__version__)
print(torchaudio.__version__)

import os
import time

import matplotlib.pyplot as plt
from torchaudio.io import StreamReader
from torchaudio.utils import ffmpeg_utils
import torchvision

#set environment variable for ffmpeg
os.environ["TORIO_USE_FFMPEG_VERSION"] = "4"

print("FFmpeg Library versions:")
for k, ver in ffmpeg_utils.get_versions().items():
    print(f"  {k}:\t{'.'.join(str(v) for v in ver)}")

    print("Available NVDEC Decoders:")
for k in ffmpeg_utils.get_video_decoders().keys():
    if "cuvid" in k:
        print(f" - {k}")

print("Avaialbe GPU:")
print(torch.cuda.get_device_properties(0))

src = "<SOME_RTSP_URL>"
s = StreamReader(src)
s.add_video_stream(5, decoder="hevc")
s.fill_buffer()
(video,) = s.pop_chunks()

print(video.shape, video.dtype, video.device)

Before reaching this issue, i've been using ffmpeg 6 and getting error with threads and gpu parameter on decoder_option.

Imagem do WhatsApp de 2024-05-20 à(s) 13 57 39_12fc5ec3

import torch
import torchaudio

print(torch.__version__)
print(torchaudio.__version__)

import os
import time

import matplotlib.pyplot as plt
from torchaudio.io import StreamReader
from torchaudio.utils import ffmpeg_utils
import torchvision

#set environment variable for ffmpeg
os.environ["TORIO_USE_FFMPEG_VERSION"] = "6"

print("FFmpeg Library versions:")
for k, ver in ffmpeg_utils.get_versions().items():
    print(f"  {k}:\t{'.'.join(str(v) for v in ver)}")

    print("Available NVDEC Decoders:")
for k in ffmpeg_utils.get_video_decoders().keys():
    if "cuvid" in k:
        print(f" - {k}")

print("Avaialbe GPU:")
print(torch.cuda.get_device_properties(0))

src = "<SOME_RTSP_URL>"
s = StreamReader(src)
s.add_video_stream(5, decoder="hevc_cuvid", hw_accel="cuda:0", decoder_option={"gpu": "0"})
s.fill_buffer()
(video,) = s.pop_chunks()

print(video.shape, video.dtype, video.device)

Versions

Collecting environment information... PyTorch version: 2.3.0+cu118 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.2 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 3.22.1 Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime) Python platform: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4070 Ti Nvidia driver version: 552.22 cuDNN version: Probably one of the following: /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8.9.7 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.7 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.7 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.7 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.7 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.7 /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.7 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 20 On-line CPU(s) list: 0-19 Vendor ID: GenuineIntel Model name: 13th Gen Intel(R) Core(TM) i5-13600K CPU family: 6 Model: 183 Thread(s) per core: 2 Core(s) per socket: 10 Socket(s): 1 Stepping: 1 BogoMIPS: 6988.79 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities Virtualization: VT-x Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 480 KiB (10 instances) L1i cache: 320 KiB (10 instances) L2 cache: 20 MiB (10 instances) L3 cache: 24 MiB (1 instance) Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Mitigation; Enhanced IBRS Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] optree==0.11.0 [pip3] torch==2.3.0+cu118 [pip3] torchaudio==2.3.0+cu118 [pip3] torchreid==0.2.5 [pip3] torchvision==0.18.0+cu118 [pip3] triton==2.3.0 [conda] Could not collect

tunmx commented 4 weeks ago

I have the same problem, have you solved it?

pedromoraesh commented 4 weeks ago

I have the same problem, have you solved it?

I did test with FFMPEG 4 and got the first error, with ffmpeg 6 got the second about threads and gpu.. Until now couldn't solve. Tried to process using FFMPEG from source, but it's not optimized, tried PyAV, but doesn't support GPU, tried tensor-stream but getting segmentation fault. No options for now, torch audio only supports RTMP on my tests and also torchvision use PyAV which cant even process RTMP.

tunmx commented 4 weeks ago

I tried setting up a streaming service using mediamtx and pushing the stream with ffmpeg, and it was successfully decoded with StreamReader. However, when I switched to using a real camera device's RTSP stream today, I encountered the same issue as you.😂😂

pedromoraesh commented 4 weeks ago

I'm deployed my own RTMP server, for H264 works fine, but RTSP i tried every single thing and seems that torch wont fix it soon... I'm trying to change codecs, parameters and that kind of stuff, but no results at moment.

tunmx commented 4 weeks ago

I found through the source code that it does not seem to support the yuvj420p format, and may need to modify the source code to enable support: https://github.com/pytorch/audio/blob/main/src/libtorio/ffmpeg/stream_reader/conversion.cpp

Another method is to use ffmpeg to fetch the stream from the remote device, decode it, and then re-encode it into a format supported by StreamReader. After that, the locally streamed data can be provided to StreamReader. However, this method requires copying, which will consume more cpu resources.

ffmpeg  -i "rtsp://example/" -c:v h264_nvenc -pix_fmt yuv420p -f rtsp rtsp:/0.0.0.0:8554/stream
pedromoraesh commented 3 weeks ago

I found through the source code that it does not seem to support the yuvj420p format, and may need to modify the source code to enable support: https://github.com/pytorch/audio/blob/main/src/libtorio/ffmpeg/stream_reader/conversion.cpp

This can help me to find a solution, yesterday i tried dev release, but still no chance. I thinking about something, stream reader acecpts some filter options that are passed on to ffmpeg. Maybe one of those filters can convert image without restreaming

Another method is to use ffmpeg to fetch the stream from the remote device, decode it, and then re-encode it into a format supported by StreamReader. After that, the locally streamed data can be provided to StreamReader. However, this method requires copying, which will consume more cpu resources.

ffmpeg  -i "rtsp://example/" -c:v h264_nvenc -pix_fmt yuv420p -f rtsp rtsp:/0.0.0.0:8554/stream

Agree, this could be CPU intensive and kill the optimization from StreamReader