opencv / opencv_contrib

Repository for OpenCV's extra modules
Apache License 2.0
9.19k stars 5.74k forks source link

Added CUDA 12.4+ support #3744

Closed asmorkalov closed 1 month ago

asmorkalov commented 1 month ago

Tries to fix https://github.com/opencv/opencv_contrib/issues/3690 for CUDA 12.4+ Related patch to main repo: https://github.com/opencv/opencv/pull/25658

Changes:

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

Sha-x2-nk commented 1 month ago

Hi. Your repo seems to have an error. I am compiling on CUDA 12.5

image
asmorkalov commented 1 month ago

Yes, the solution is not complete yet.

cudawarped commented 1 month ago

Hi. Your repo seems to have an error. I am compiling on CUDA 12.5 ...image instead of text for some reason...

Your error has nothing to do with this PR. You are building against a commit in the contrib repo before https://github.com/opencv/opencv_contrib/pull/3378 was added when the line it mentions

typedef texture<T, cudaTextureType2D, cudaReadModeElementType> TexRef;

was removed.

Sha-x2-nk commented 1 month ago

Hi. Your repo seems to have an error. I am compiling on CUDA 12.5 ...image instead of text for some reason...

Your error has nothing to do with this PR. You are building against a commit in the contrib repo before https://github.com/opencv/opencv_contrib/pull/3378 was added when the line it mentions


typedef texture<T, cudaTextureType2D, cudaReadModeElementType> TexRef;

was removed.

Hey. So you're saying his pr not on latest opencv_contrib??

cudawarped commented 1 month ago

Hey. So you're saying his pr not on latest opencv_contrib??

Exactly this PR is Open not Merged so it is on asmorkalov's private branch. Additionally you are not building against the latest commit from opencv_contrib 4.x either as indicated by your error.

Sha-x2-nk commented 1 month ago

Hey. So you're saying his pr not on latest opencv_contrib??

Exactly this PR is Open not Merged so it is on asmorkalov's private branch. Additionally you are not building against the latest commit from opencv_contrib 4.x either as indicated by your error.

about that, I cloned and built his repo only.

cudawarped commented 1 month ago

about that, I cloned and built his repo only.

You must be building against a commit in his repo which pre-dates the PR (https://github.com/opencv/opencv_contrib/pull/3378) where the error you are getting would have been removed. You need to checkout his _vardictuple branch.

Either way you will still get errors with CUDA 12.5 at the moment. The point is that they will be different errors which are related to this PR.

asmorkalov commented 1 month ago

@cudawarped @vrabaud could you try the PR on your side and provide comments. I tested with Ubuntu 22.04, CUDA 12.5, cuDNN 9.1 and GeForce 2080. Please let me know, if you observe issues.

cudawarped commented 1 month ago

@cudawarped @vrabaud could you try the PR on your side and provide comments. I tested with Ubuntu 22.04, CUDA 12.5, cuDNN 9.1 and GeForce 2080. Please let me know, if you observe issues.

I get the following error when building with Windows 11, CUDA 12.5, cuDNN 9.1

[1761/3991] Building CUDA object modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj
FAILED: modules/photo/CMakeFiles/opencv_photo.dir/Debug/src/cuda/nlm.cu.obj
C:\PROGRA~1\NVIDIA~2\CUDA\v12.5\bin\nvcc.exe -forward-unknown-to-host-compiler -DCVAPI_EXPORTS -D_USE_MATH_DEFINES -D_VARIADIC_MAX=10 -D_WIN32_WINNT=0x0601 -D__OPENCV_BUILD=1 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -DCMAKE_INTDIR=\"Debug\" -ID:\build\opencv\cuda_12_5_t\3rdparty\ippicv\ippicv_win\icv\include -ID:\build\opencv\cuda_12_5_t\3rdparty\ippicv\ippicv_win\iw\include -ID:\repos\opencv\opencv\modules\photo\src -ID:\repos\opencv\opencv\modules\photo\include -ID:\build\opencv\cuda_12_5_t\modules\photo -ID:\repos\opencv\contrib\modules\cudev\include -ID:\repos\opencv\opencv\modules\core\include -ID:\repos\opencv\contrib\modules\cudaarithm\include -ID:\repos\opencv\opencv\modules\imgproc\include -ID:\repos\opencv\contrib\modules\cudafilters\include -ID:\repos\opencv\contrib\modules\cudaimgproc\include -ID:\repos\opencv\opencv\modules\ts\include -ID:\repos\opencv\opencv\modules\imgcodecs\include -ID:\repos\opencv\opencv\modules\videoio\include -ID:\repos\opencv\opencv\modules\highgui\include -isystem D:\build\opencv\cuda_12_5_t -isystem "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -D_WINDOWS -Xcompiler=" /GR /EHsc" -Xcompiler=" -Zi -Ob0 -Od /RTC1" -std=c++14 "--generate-code=arch=compute_86,code=[sm_86]" -Xcompiler=-MDd -D_FORCE_INLINES -Xcompiler=-DCVAPI_EXPORTS -Xcudafe --display_error_number --diag-suppress 1394,1388 "-Xcompiler= /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise /FS      /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819  /Zi /Ob0 /Od /RTC1    " -MD -MT modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj -MF modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj.d -x cu -c D:\repos\opencv\opencv\modules\photo\src\cuda\nlm.cu -o modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj -Xcompiler=-Fdlib\Debug\opencv_photo4100d.pdb,-FS
D:\repos\opencv\opencv\modules\photo\src\cuda\nlm.cu(421): error: no instance of overloaded function "cv::cuda::device::reduce" matches the argument list
            argument types are: (cuda::std::__4::tuple<volatile float *, volatile float *>, cuda::std::__4::tuple<float &, float &>, const unsigned int, const cuda::std::__4::tuple<cv::cuda::device::plus<float>, cv::cuda::device::plus<float>>)
                  reduce<CTA_SIZE>(Unroll<VecTraits<T>::cn>::template smem_tuple<CTA_SIZE>(cta_buffer),
                  ^
D:\repos\opencv\opencv\modules\core\include\opencv2/core/cuda/reduce.hpp(71): note #3327-D: candidate function template "cv::cuda::device::reduce<N,P0,P1,P2,P3,P4,P5,P6,P7,P8,P9,R0,R1,R2,R3,R4,R5,R6,R7,R8,R9,Op0,Op1,Op2,Op3,Op4,Op5,Op6,Op7,Op8,Op9>(const thrust::THRUST_200400_860_NS::tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9> &, const thrust::THRUST_200400_860_NS::tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9> &, unsigned int, const thrust::THRUST_200400_860_NS::tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9> &)" failed deduction
      __declspec(__device__) __forceinline void reduce(const thrust::tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem,
                                                ^
D:\repos\opencv\opencv\modules\core\include\opencv2/core/cuda/reduce.hpp(63): note #3327-D: candidate function template "cv::cuda::device::reduce<N,T,Op>(volatile T *, T &, unsigned int, const Op &)" failed deduction
      __declspec(__device__) __forceinline void reduce(volatile T* smem, T& val, unsigned int tid, const Op& op)
                                                ^
          detected during:
            instantiation of "void cv::cuda::device::imgproc::FastNonLocalMeans<T>::convolve_window(int, int, const int *, T &) const [with T=uchar]" at line 472
            instantiation of "void cv::cuda::device::imgproc::FastNonLocalMeans<T>::operator()(cv::cuda::PtrStepSz<T> &) const [with T=uchar]" at line 479
            instantiation of "void cv::cuda::device::imgproc::fast_nlm_kernel(cv::cuda::device::imgproc::FastNonLocalMeans<T>, cv::cuda::PtrStepSz<T>) [with T=uchar]" at line 505
            instantiation of "void cv::cuda::device::imgproc::nlm_fast_gpu<T>(const cv::cuda::PtrStepSzb &, cv::cuda::PtrStepSzb, cv::cuda::PtrStepi, int, int, float, cudaStream_t) [with T=uchar]" at line 511

D:\repos\opencv\opencv\modules\photo\src\cuda\nlm.cu(421): error: no instance of overloaded function "cv::cuda::device::reduce" matches the argument list
            argument types are: (cuda::std::__4::tuple<volatile float *, volatile float *, volatile float *>, cuda::std::__4::tuple<float &, float &, float &>, const unsigned int, const cuda::std::__4::tuple<cv::cuda::device::plus<float>, cv::cuda::device::plus<float>, cv::cuda::device::plus<float>>)
                  reduce<CTA_SIZE>(Unroll<VecTraits<T>::cn>::template smem_tuple<CTA_SIZE>(cta_buffer),
asmorkalov commented 1 month ago

@cudawarped Thanks for the trial. You also need corresponding patch in main repo: https://github.com/opencv/opencv/pull/25658

cudawarped commented 1 month ago

@cudawarped Thanks for the trial. You also need corresponding patch in main repo: opencv/opencv#25658

Building on Windows 11 for both CUDA 12.3 and 12.5 and passing all CUDA tests on RTX 3070 except DNN and previously failing ones (https://github.com/opencv/opencv_contrib/issues/3374).