Added CUDA 12.4+ support

asmorkalov commented 1 month ago

Tries to fix https://github.com/opencv/opencv_contrib/issues/3690 for CUDA 12.4+ Related patch to main repo: https://github.com/opencv/opencv/pull/25658

Changes:

Added branches to support new variadic implementation of thrust::tuple
Added branch with std::array instead of std::tuple in split-merge and grid operations. The new branch got rid of namespace clash: cv::cuda in OpenCV and ::cuda in CUDA standard library (injected by Thrust). Old tuple branches presumed for compatibility with old code and CUDA versions before 12.4.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

[x] I agree to contribute to the project under Apache 2 License.
[x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
[x] The PR is proposed to the proper branch
[x] There is a reference to the original bug report and related work
[ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.
[ ] The feature is well documented and sample code can be built with the project CMake

Sha-x2-nk commented 1 month ago

Hi. Your repo seems to have an error. I am compiling on CUDA 12.5

asmorkalov commented 1 month ago

Yes, the solution is not complete yet.

cudawarped commented 1 month ago

Hi. Your repo seems to have an error. I am compiling on CUDA 12.5 ...image instead of text for some reason...

Your error has nothing to do with this PR. You are building against a commit in the contrib repo before https://github.com/opencv/opencv_contrib/pull/3378 was added when the line it mentions

typedef texture<T, cudaTextureType2D, cudaReadModeElementType> TexRef;

was removed.

Sha-x2-nk commented 1 month ago

Hi. Your repo seems to have an error. I am compiling on CUDA 12.5 ...image instead of text for some reason...

Your error has nothing to do with this PR. You are building against a commit in the contrib repo before https://github.com/opencv/opencv_contrib/pull/3378 was added when the line it mentions
typedef texture<T, cudaTextureType2D, cudaReadModeElementType> TexRef;
was removed.

Hey. So you're saying his pr not on latest opencv_contrib??

cudawarped commented 1 month ago

Hey. So you're saying his pr not on latest opencv_contrib??

Exactly this PR is Open not Merged so it is on asmorkalov's private branch. Additionally you are not building against the latest commit from opencv_contrib 4.x either as indicated by your error.

Sha-x2-nk commented 1 month ago

Hey. So you're saying his pr not on latest opencv_contrib??

Exactly this PR is Open not Merged so it is on asmorkalov's private branch. Additionally you are not building against the latest commit from opencv_contrib 4.x either as indicated by your error.

about that, I cloned and built his repo only.

cudawarped commented 1 month ago

about that, I cloned and built his repo only.

You must be building against a commit in his repo which pre-dates the PR (https://github.com/opencv/opencv_contrib/pull/3378) where the error you are getting would have been removed. You need to checkout his _vardictuple branch.

Either way you will still get errors with CUDA 12.5 at the moment. The point is that they will be different errors which are related to this PR.

asmorkalov commented 1 month ago

@cudawarped @vrabaud could you try the PR on your side and provide comments. I tested with Ubuntu 22.04, CUDA 12.5, cuDNN 9.1 and GeForce 2080. Please let me know, if you observe issues.

cudawarped commented 1 month ago

@cudawarped @vrabaud could you try the PR on your side and provide comments. I tested with Ubuntu 22.04, CUDA 12.5, cuDNN 9.1 and GeForce 2080. Please let me know, if you observe issues.

I get the following error when building with Windows 11, CUDA 12.5, cuDNN 9.1

[1761/3991] Building CUDA object modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj
FAILED: modules/photo/CMakeFiles/opencv_photo.dir/Debug/src/cuda/nlm.cu.obj
C:\PROGRA~1\NVIDIA~2\CUDA\v12.5\bin\nvcc.exe -forward-unknown-to-host-compiler -DCVAPI_EXPORTS -D_USE_MATH_DEFINES -D_VARIADIC_MAX=10 -D_WIN32_WINNT=0x0601 -D__OPENCV_BUILD=1 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -DCMAKE_INTDIR=\"Debug\" -ID:\build\opencv\cuda_12_5_t\3rdparty\ippicv\ippicv_win\icv\include -ID:\build\opencv\cuda_12_5_t\3rdparty\ippicv\ippicv_win\iw\include -ID:\repos\opencv\opencv\modules\photo\src -ID:\repos\opencv\opencv\modules\photo\include -ID:\build\opencv\cuda_12_5_t\modules\photo -ID:\repos\opencv\contrib\modules\cudev\include -ID:\repos\opencv\opencv\modules\core\include -ID:\repos\opencv\contrib\modules\cudaarithm\include -ID:\repos\opencv\opencv\modules\imgproc\include -ID:\repos\opencv\contrib\modules\cudafilters\include -ID:\repos\opencv\contrib\modules\cudaimgproc\include -ID:\repos\opencv\opencv\modules\ts\include -ID:\repos\opencv\opencv\modules\imgcodecs\include -ID:\repos\opencv\opencv\modules\videoio\include -ID:\repos\opencv\opencv\modules\highgui\include -isystem D:\build\opencv\cuda_12_5_t -isystem "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -D_WINDOWS -Xcompiler=" /GR /EHsc" -Xcompiler=" -Zi -Ob0 -Od /RTC1" -std=c++14 "--generate-code=arch=compute_86,code=[sm_86]" -Xcompiler=-MDd -D_FORCE_INLINES -Xcompiler=-DCVAPI_EXPORTS -Xcudafe --display_error_number --diag-suppress 1394,1388 "-Xcompiler= /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise /FS      /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819  /Zi /Ob0 /Od /RTC1    " -MD -MT modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj -MF modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj.d -x cu -c D:\repos\opencv\opencv\modules\photo\src\cuda\nlm.cu -o modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj -Xcompiler=-Fdlib\Debug\opencv_photo4100d.pdb,-FS
D:\repos\opencv\opencv\modules\photo\src\cuda\nlm.cu(421): error: no instance of overloaded function "cv::cuda::device::reduce" matches the argument list
            argument types are: (cuda::std::__4::tuple<volatile float *, volatile float *>, cuda::std::__4::tuple<float &, float &>, const unsigned int, const cuda::std::__4::tuple<cv::cuda::device::plus<float>, cv::cuda::device::plus<float>>)
                  reduce<CTA_SIZE>(Unroll<VecTraits<T>::cn>::template smem_tuple<CTA_SIZE>(cta_buffer),
                  ^
D:\repos\opencv\opencv\modules\core\include\opencv2/core/cuda/reduce.hpp(71): note #3327-D: candidate function template "cv::cuda::device::reduce<N,P0,P1,P2,P3,P4,P5,P6,P7,P8,P9,R0,R1,R2,R3,R4,R5,R6,R7,R8,R9,Op0,Op1,Op2,Op3,Op4,Op5,Op6,Op7,Op8,Op9>(const thrust::THRUST_200400_860_NS::tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9> &, const thrust::THRUST_200400_860_NS::tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9> &, unsigned int, const thrust::THRUST_200400_860_NS::tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9> &)" failed deduction
      __declspec(__device__) __forceinline void reduce(const thrust::tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem,
                                                ^
D:\repos\opencv\opencv\modules\core\include\opencv2/core/cuda/reduce.hpp(63): note #3327-D: candidate function template "cv::cuda::device::reduce<N,T,Op>(volatile T *, T &, unsigned int, const Op &)" failed deduction
      __declspec(__device__) __forceinline void reduce(volatile T* smem, T& val, unsigned int tid, const Op& op)
                                                ^
          detected during:
            instantiation of "void cv::cuda::device::imgproc::FastNonLocalMeans<T>::convolve_window(int, int, const int *, T &) const [with T=uchar]" at line 472
            instantiation of "void cv::cuda::device::imgproc::FastNonLocalMeans<T>::operator()(cv::cuda::PtrStepSz<T> &) const [with T=uchar]" at line 479
            instantiation of "void cv::cuda::device::imgproc::fast_nlm_kernel(cv::cuda::device::imgproc::FastNonLocalMeans<T>, cv::cuda::PtrStepSz<T>) [with T=uchar]" at line 505
            instantiation of "void cv::cuda::device::imgproc::nlm_fast_gpu<T>(const cv::cuda::PtrStepSzb &, cv::cuda::PtrStepSzb, cv::cuda::PtrStepi, int, int, float, cudaStream_t) [with T=uchar]" at line 511

D:\repos\opencv\opencv\modules\photo\src\cuda\nlm.cu(421): error: no instance of overloaded function "cv::cuda::device::reduce" matches the argument list
            argument types are: (cuda::std::__4::tuple<volatile float *, volatile float *, volatile float *>, cuda::std::__4::tuple<float &, float &, float &>, const unsigned int, const cuda::std::__4::tuple<cv::cuda::device::plus<float>, cv::cuda::device::plus<float>, cv::cuda::device::plus<float>>)
                  reduce<CTA_SIZE>(Unroll<VecTraits<T>::cn>::template smem_tuple<CTA_SIZE>(cta_buffer),

asmorkalov commented 1 month ago

@cudawarped Thanks for the trial. You also need corresponding patch in main repo: https://github.com/opencv/opencv/pull/25658

cudawarped commented 1 month ago

@cudawarped Thanks for the trial. You also need corresponding patch in main repo: opencv/opencv#25658

Building on Windows 11 for both CUDA 12.3 and 12.5 and passing all CUDA tests on RTX 3070 except DNN and previously failing ones (https://github.com/opencv/opencv_contrib/issues/3374).

opencv / opencv_contrib

Added CUDA 12.4+ support #3744

Pull Request Readiness Checklist