pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.43k stars 636 forks source link

streamreader add_video_stream doesn't seem to accept any filter_desc options #3719

Open caspersmit-sa opened 6 months ago

caspersmit-sa commented 6 months ago

🐛 Describe the bug

I'm using the following options in my streamreader:

vr.add_video_stream(
                    frames_per_chunk=decode_size, 
                    decoder=codec, 
                    decoder_option={"threads": "0", "gpu": "0"}, 
                    hw_accel='cuda',
                    filter_desc=f"format=pix_fmts=rgb24"
                    )

Unfortunately I get the error RuntimeError: Failed to configure the graph: Function not implemented. If I remove the filter_desc option the code runs normally. For me the streamreader is not very useful if the output is not in rgb24 but in yuv444p instead. Is there a way to fix this (without moving to the nightly build), or are there any alternatives?

Versions

PyTorch version: 2.1.2+cu118 Is CUDA available: True [pip3] numpy==1.24.1 [pip3] torch==2.1.2+cu118 [pip3] torchaudio==2.1.2+cu118 [pip3] torchvision==0.16.2+cu118 [pip3] triton==2.1.0

mthrok commented 6 months ago

This is unfortunately possible, because FFmpeg does not implement such function. When using HW acceleration, the decoded frames are kept in CUDA memory, and only a handful of filters can handle CUDA memory. pix_fmts filter does not support CUDA frames.

If you search _cuda in https://ffmpeg.org/ffmpeg-filters.html, you see filters that support CUDA frames. The closest for color conversion is scale_cuda function, but the doc says The filter does not support converting between YUV and RGB pixel formats..

A workaround is to convert to YUV444 and do YUV->RGB conversion on PyTorch tensor, like this