cudacodec VideoReader frame decoding not working properly

jeshels commented 2 years ago

System information (version)

OpenCV => opencv_contrib_python-4.5.5.64-cp39-cp39-linux_x86_64.whl
Operating System / Platform => Ubuntu 18.04
Compiler => gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

Detailed description

Reading frames using cudacodec VideoReader yields unexpected frame size and color space type.

Taking an mp4 video file of 1920x1080 resolution, I receive frames of 1920x1088 resolution (note the extra 8 pixels) and a color space type of cv2.CV_8UC4 (instead of the expected cv2.CV_8UC3). When displaying the frames to the screen via cv2.imshow(), everything looks fine. The addition of extraneous pixels happens only for some of my videos (not reproducing in this video), but the unexpected color space type is constantly reproducing for all video files.

Here is my build information:

Click to expand

### Installed dependencies: - Video_Codec_SDK_11.1.5 - cuda_11.3.0_465.19.01 - libcudnn8-dev_8.2.1.32-1+cuda11.3_amd64.deb - libcudnn8_8.2.1.32-1+cuda11.3_amd64.deb ### OpenCV build information: > I had to manually add the following to `OpenCVDetectCUDA.cmake` to enable NVCUVID build: > ``` > set(CUDA_nvcuvid_LIBRARY "/home/Video_Codec_SDK_11.1.5/Lib/linux/stubs/x86_64/libnvcuvid.so") > set(CUDA_CUDA_LIBRARY "/usr/local/cuda/lib64/stubs/libcuda.so") > ``` ``` General configuration for OpenCV 4.5.5 ===================================== Version control: 4.5.5-dirty Extra modules: Location (extra): /home/opencv-with-cuda/opencv-python/opencv_contrib/modules Version control (extra): 4.5.5 Platform: Timestamp: 2022-03-09T14:49:28Z Host: Linux 5.4.0-100-generic x86_64 CMake: 3.22.2 CMake generator: Unix Makefiles CMake build tool: /usr/bin/make Configuration: Release CPU/HW features: Baseline: SSE SSE2 SSE3 requested: SSE3 Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX SSE4_1 (16 files): + SSSE3 SSE4_1 SSE4_2 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 (0 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX AVX (4 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX AVX2 (31 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX512_SKX (5 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX C/C++: Built as dynamic libs?: NO C++ standard: 11 C++ Compiler: /usr/bin/c++ (ver 7.5.0) C++ flags (Release): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG C++ flags (Debug): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG C Compiler: /usr/bin/cc C flags (Release): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG C flags (Debug): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG Linker flags (Release): -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a -Wl,--gc-sections -Wl,--as-needed Linker flags (Debug): -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a -Wl,--gc-sections -Wl,--as-needed ccache: NO Precompiled headers: NO Extra dependencies: /usr/lib/x86_64-linux-gnu/libopenblas.so /usr/lib/x86_64-linux-gnu/libjpeg.so /usr/lib/x86_64-linux-gnu/libtiff.so /usr/lib/x86_64-linux-gnu/libz.so /usr/local/cuda/lib64/stubs/libcuda.so /home/Video_Codec_SDK_11.1.5/Lib/linux/stubs/x86_64/libnvcuvid.so Iconv::Iconv m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cudnn cufft -L/usr/local/cuda/lib64 -L/usr/lib/x86_64-linux-gnu 3rdparty dependencies: libprotobuf ade ittnotify libwebp libpng libopenjp2 IlmImf quirc ippiw ippicv OpenCV modules: To be built: aruco barcode bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto Disabled: freetype world Disabled by dependency: - Unavailable: alphamat cvv hdf java julia matlab ovis python2 sfm ts viz Applications: - Documentation: NO Non-free algorithms: NO GUI: GTK2 GTK+: YES (ver 2.24.32) GThread : YES (ver 2.56.4) GtkGlExt: NO VTK support: NO Media I/O: ZLib: /usr/lib/x86_64-linux-gnu/libz.so (ver 1.2.11) JPEG: /usr/lib/x86_64-linux-gnu/libjpeg.so (ver 80) WEBP: build (ver encoder: 0x020f) PNG: build (ver 1.6.37) TIFF: /usr/lib/x86_64-linux-gnu/libtiff.so (ver 42 / 4.0.9) JPEG 2000: build (ver 2.4.0) OpenEXR: build (ver 2.3.0) HDR: YES SUNRASTER: YES PXM: YES PFM: YES Video I/O: DC1394: YES (2.2.5) FFMPEG: YES avcodec: YES (57.107.100) avformat: YES (57.83.100) avutil: YES (55.78.100) swscale: YES (4.8.100) avresample: YES (3.7.0) GStreamer: YES (1.14.5) v4l/v4l2: YES (linux/videodev2.h) Parallel framework: pthreads Trace: YES (with Intel ITT) Other third-party libraries: Intel IPP: 2020.0.0 Gold [2020.0.0] at: /home/opencv-with-cuda/opencv-python/_skbuild/linux-x86_64-3.9/cmake-build/3rdparty/ippicv/ippicv_lnx/icv Intel IPP IW: sources (2020.0.0) at: /home/opencv-with-cuda/opencv-python/_skbuild/linux-x86_64-3.9/cmake-build/3rdparty/ippicv/ippicv_lnx/iw VA: NO Lapack: YES (/usr/lib/x86_64-linux-gnu/libopenblas.so) Eigen: NO Custom HAL: NO Protobuf: build (3.19.1) NVIDIA CUDA: YES (ver 11.3, CUFFT CUBLAS NVCUVID FAST_MATH) NVIDIA GPU arch: 75 NVIDIA PTX archs: cuDNN: YES (ver 8.2.1) OpenCL: YES (no extra features) Include path: /home/opencv-with-cuda/opencv-python/opencv/3rdparty/include/opencl/1.2 Link libraries: Dynamic load Python 3: Interpreter: /usr/bin/python3.9 (ver 3.9.10) Libraries: /usr/lib/x86_64-linux-gnu/libpython3.9.so (ver 3.9.10) numpy: /tmp/pip-build-env-b1z6otvw/overlay/lib/python3.9/site-packages/numpy/core/include (ver 1.19.3) install path: python/cv2/python-3 Python (for build): /usr/bin/python3.9 Java: ant: NO JNI: NO Java wrappers: NO Java tests: NO Install to: /home/opencv-with-cuda/opencv-python/_skbuild/linux-x86_64-3.9/cmake-install ----------------------------------------------------------------- ```

Additional note

It is worth mentioning that these issues don't reproduce with my OpenCV build for CUDA 10. Perhaps it is related to the change to CUDA 11?

Steps to reproduce

import cv2

cap = cv2.cudacodec.createVideoReader("outdoor_short.mp4")

while(True):
    succeeded, gpu_frame = cap.nextFrame()
    if succeeded:
        print(f'frame resolution: {gpu_frame.size()}')
        print(f'frame color space type: {gpu_frame.type()}')
        cv2.imshow("vid", gpu_frame.download())

        if cv2.waitKey(25) & 0xFF == ord('q'):
            break
    else:
        break

cv2.destroyAllWindows()

Issue submission checklist

[x] I report the issue, it's not a question
[x] I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
[x] I updated to the latest OpenCV version and the issue is still there These issues don't reproduce with my OpenCV build for CUDA 10.
[x] There is reproducer code and related data files: videos, images, onnx, etc The two mentioned example video files:
1. https://user-images.githubusercontent.com/80462280/157727654-ac88f718-b786-4746-ba9a-2166d729874e.mp4
2. https://user-images.githubusercontent.com/80462280/157729453-44571e44-53f4-4698-a330-19be68e0d874.mp4

alalek commented 2 years ago

opencv_contrib_python-4.5.5.64

If it is a package from PyPi, then there is no CUDA support at all.

You need to build OpenCV from sources with enabled CUDA dependencies.

jeshels commented 2 years ago

opencv_contrib_python-4.5.5.64

If it is a package from PyPi, then there is no CUDA support at all.

You need to build OpenCV from sources with enabled CUDA dependencies.

I have built OpenCV from source with CUDA features enabled. Kindly refer to the attached build information in my original post.

alalek commented 2 years ago

/cc @cudawarped Any thoughts on this difference between CUDA 10 and 11?

cudawarped commented 2 years ago

/cc @cudawarped Any thoughts on this difference between CUDA 10 and 11?

I would be suprised if the CUDA version makes a difference as I would expect the Video SDK version to determine this behaviour, but I would have to check to find out. @jeshels did you use the same Video SDK version in both your CUDA 10 and 11 builds?

Reading frames using cudacodec VideoReader yields unexpected frame size and color space type.

Taking an mp4 video file of 1920x1080 resolution, I receive frames of 1920x1088 resolution (note the extra 8 pixels) and a color space type of cv2.CV_8UC4 (instead of the expected cv2.CV_8UC3)

Please check out https://github.com/opencv/opencv_contrib/pull/3001 for instructions on how to get the useable area for the frames returned from cudacodec::VideoReader(). FormatInfo::displayArea contains the useable area with FormatInfo::width/height containing the coded dimensions, its not pretty but it works and didn't break the existing api.

As far as I am aware the decoded colour space type is always BGRA for decoding efficiency. I think support for an alpha channel is codec dependant. If this can be determined by the decoder then this information could be included in VideoReader::FormatInfo but I think the output frame would still have to be BGRA to avoid breaking any existing code.

I had to manually add the following to OpenCVDetectCUDA.cmake to enable NVCUVID build:
set(CUDA_nvcuvid_LIBRARY "/home/Video_Codec_SDK_11.1.5/Lib/linux/stubs/x86_64/libnvcuvid.so")
set(CUDA_CUDA_LIBRARY "/usr/local/cuda/lib64/stubs/libcuda.so")

Did you try passing

-DCUDA_nvcuvid_LIBRARY=/home/Video_Codec_SDK_11.1.5/Lib/linux/stubs/x86_64/libnvcuvid.so

to the CMake as this works for me?

jeshels commented 2 years ago

Thank you @cudawarped for all the information!

First, I have to correct myself. I've used my CUDA 10 OpenCV build for a brief period of time so it is possible that the mentioned "unexpected behavior" happens there as well (perhaps I didn't notice it). The actual reason for calling it "unexpected" is due to the difference compared to cv2.VideoCapture.read(). Unfortunately, I am unable to test and verify my claim. Apologies for this.

@jeshels did you use the same Video SDK version in both your CUDA 10 and 11 builds?

I used different versions: Video_Codec_SDK_10.0.26 for CUDA 10 build, and Video_Codec_SDK_11.1.5 for CUDA 11 build.

Please check out https://github.com/opencv/opencv_contrib/pull/3001 for instructions on how to get the useable area for the frames returned from cudacodec::VideoReader(). FormatInfo::displayArea contains the useable area with FormatInfo::width/height containing the coded dimensions, its not pretty but it works and didn't break the existing api.

The class FormatInfo and its related API are not exposed to the Python bindings. Is there a Python based solution for this? Also, does this mean that one must manually crop the frame to the useable area before use (for example, before displaying the frames to screen)?

As far as I am aware the decoded colour space type is always BGRA for decoding efficiency. I think support for an alpha channel is codec dependant. If this can be determined by the decoder then this information could be included in VideoReader::FormatInfo but I think the output frame would still have to be BGRA to avoid breaking any existing code.

I understand. Is it possible to configure VideoReader to output BGR color space type? My code relies on cv2.VideoCapture() output which produces BGR color space type, and I would like to be able to easily switch between VideoReader and VideoCapture. Otherwise, I'll have to manually convert VideoReader output frames to BGR before using them in my code, due to the difference in frame shape when I download the frame to the CPU.

Did you try passing

-DCUDA_nvcuvid_LIBRARY=/home/Video_Codec_SDK_11.1.5/Lib/linux/stubs/x86_64/libnvcuvid.so

to the CMake as this works for me?

Yeah, I did. But for some reason it gets unset inside CMake. Here are the shell variables I'm setting before build:

export CMAKE_ARGS="-DWITH_CUDA=ON -DWITH_CUDNN=ON -DOPENCV_DNN_CUDA=ON -DENABLE_FAST_MATH=ON -DCUDA_FAST_MATH=ON -DWITH_CUBLAS=ON -DCUDA_ARCH_BIN=7.5 -DWITH_NVCUVID=ON -DCMAKE_LIBRARY_PATH=/usr/local/cuda/lib64/stubs -DWITH_GTK=ON"
export CUDA_nvcuvid_LIBRARY=/home/Video_Codec_SDK_11.1.5/Lib/linux/stubs/x86_64/libnvcuvid.so
export ENABLE_CONTRIB=1
export CMAKE_BUILD_PARALLEL_LEVEL=$(nproc)

I tried to debug the CMake files but failed to identify the root cause for the resetting of CUDA_nvcuvid_LIBRARY (and other related CUDA variables) values inside the OpenCV CMake files.

cudawarped commented 2 years ago

The class FormatInfo and its related API are not exposed to the Python bindings. Is there a Python based solution for this? Also, does this mean that one must manually crop the frame to the useable area before use (for example, before displaying the frames to screen)?

Sorry I didn't realize, I'm just adding python bindings and testing them now on the master branch. It will mean that you need to pass the useable ROI as a numpy view to all routines which use the frame if you want to ignore the padded lines at the bottom. of the decoded frames.

I understand. Is it possible to configure VideoReader to output BGR color space type? My code relies on cv2.VideoCapture() output which produces BGR color space type, and I would like to be able to easily switch between VideoReader and VideoCapture. Otherwise, I'll have to manually convert VideoReader output frames to BGR before using them in my code, due to the difference in frame shape when I download the frame to the CPU.

Currently no but I would perform the conversion to BGR on the device before downloading to the host.

When VideoReader::format() has been exposed to python you should be able to proceed as below.

reader = cv2.cudacodec.createVideoReader("SOURCE")
# read a frame to ensure that the decoder has been called and format_info will be valid
ret, frame_bgra = reader.nextFrame()
format_info = reader.format()
if(format_info.valid): # not required if a frame has been decoded, included only to highlight the need to check validity if this is not the case
    frame_bgr = cv2.cuda.cvtColor(frame_bgra,cv2.COLOR_BGRA2BGR)
    frame_host = frame_bgr.download()
    # get a np view of the display area
    frame_display = frame_host[0:format_info.displayArea[3],0:format_info.displayArea[2],:]

opencv / opencv_contrib