Closed asmorkalov closed 1 month ago
Hi. Your repo seems to have an error. I am compiling on CUDA 12.5
Yes, the solution is not complete yet.
Hi. Your repo seems to have an error. I am compiling on CUDA 12.5 ...image instead of text for some reason...
Your error has nothing to do with this PR. You are building against a commit in the contrib repo before https://github.com/opencv/opencv_contrib/pull/3378 was added when the line it mentions
typedef texture<T, cudaTextureType2D, cudaReadModeElementType> TexRef;
was removed.
Hi. Your repo seems to have an error. I am compiling on CUDA 12.5 ...image instead of text for some reason...
Your error has nothing to do with this PR. You are building against a commit in the contrib repo before https://github.com/opencv/opencv_contrib/pull/3378 was added when the line it mentions
typedef texture<T, cudaTextureType2D, cudaReadModeElementType> TexRef;
was removed.
Hey. So you're saying his pr not on latest opencv_contrib??
Hey. So you're saying his pr not on latest opencv_contrib??
Exactly this PR is Open not Merged so it is on asmorkalov's private branch. Additionally you are not building against the latest commit from opencv_contrib 4.x either as indicated by your error.
Hey. So you're saying his pr not on latest opencv_contrib??
Exactly this PR is Open not Merged so it is on asmorkalov's private branch. Additionally you are not building against the latest commit from opencv_contrib 4.x either as indicated by your error.
about that, I cloned and built his repo only.
about that, I cloned and built his repo only.
You must be building against a commit in his repo which pre-dates the PR (https://github.com/opencv/opencv_contrib/pull/3378) where the error you are getting would have been removed. You need to checkout his _vardictuple branch.
Either way you will still get errors with CUDA 12.5 at the moment. The point is that they will be different errors which are related to this PR.
@cudawarped @vrabaud could you try the PR on your side and provide comments. I tested with Ubuntu 22.04, CUDA 12.5, cuDNN 9.1 and GeForce 2080. Please let me know, if you observe issues.
@cudawarped @vrabaud could you try the PR on your side and provide comments. I tested with Ubuntu 22.04, CUDA 12.5, cuDNN 9.1 and GeForce 2080. Please let me know, if you observe issues.
I get the following error when building with Windows 11, CUDA 12.5, cuDNN 9.1
[1761/3991] Building CUDA object modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj
FAILED: modules/photo/CMakeFiles/opencv_photo.dir/Debug/src/cuda/nlm.cu.obj
C:\PROGRA~1\NVIDIA~2\CUDA\v12.5\bin\nvcc.exe -forward-unknown-to-host-compiler -DCVAPI_EXPORTS -D_USE_MATH_DEFINES -D_VARIADIC_MAX=10 -D_WIN32_WINNT=0x0601 -D__OPENCV_BUILD=1 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -DCMAKE_INTDIR=\"Debug\" -ID:\build\opencv\cuda_12_5_t\3rdparty\ippicv\ippicv_win\icv\include -ID:\build\opencv\cuda_12_5_t\3rdparty\ippicv\ippicv_win\iw\include -ID:\repos\opencv\opencv\modules\photo\src -ID:\repos\opencv\opencv\modules\photo\include -ID:\build\opencv\cuda_12_5_t\modules\photo -ID:\repos\opencv\contrib\modules\cudev\include -ID:\repos\opencv\opencv\modules\core\include -ID:\repos\opencv\contrib\modules\cudaarithm\include -ID:\repos\opencv\opencv\modules\imgproc\include -ID:\repos\opencv\contrib\modules\cudafilters\include -ID:\repos\opencv\contrib\modules\cudaimgproc\include -ID:\repos\opencv\opencv\modules\ts\include -ID:\repos\opencv\opencv\modules\imgcodecs\include -ID:\repos\opencv\opencv\modules\videoio\include -ID:\repos\opencv\opencv\modules\highgui\include -isystem D:\build\opencv\cuda_12_5_t -isystem "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -D_WINDOWS -Xcompiler=" /GR /EHsc" -Xcompiler=" -Zi -Ob0 -Od /RTC1" -std=c++14 "--generate-code=arch=compute_86,code=[sm_86]" -Xcompiler=-MDd -D_FORCE_INLINES -Xcompiler=-DCVAPI_EXPORTS -Xcudafe --display_error_number --diag-suppress 1394,1388 "-Xcompiler= /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:precise /FS /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /Zi /Ob0 /Od /RTC1 " -MD -MT modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj -MF modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj.d -x cu -c D:\repos\opencv\opencv\modules\photo\src\cuda\nlm.cu -o modules\photo\CMakeFiles\opencv_photo.dir\Debug\src\cuda\nlm.cu.obj -Xcompiler=-Fdlib\Debug\opencv_photo4100d.pdb,-FS
D:\repos\opencv\opencv\modules\photo\src\cuda\nlm.cu(421): error: no instance of overloaded function "cv::cuda::device::reduce" matches the argument list
argument types are: (cuda::std::__4::tuple<volatile float *, volatile float *>, cuda::std::__4::tuple<float &, float &>, const unsigned int, const cuda::std::__4::tuple<cv::cuda::device::plus<float>, cv::cuda::device::plus<float>>)
reduce<CTA_SIZE>(Unroll<VecTraits<T>::cn>::template smem_tuple<CTA_SIZE>(cta_buffer),
^
D:\repos\opencv\opencv\modules\core\include\opencv2/core/cuda/reduce.hpp(71): note #3327-D: candidate function template "cv::cuda::device::reduce<N,P0,P1,P2,P3,P4,P5,P6,P7,P8,P9,R0,R1,R2,R3,R4,R5,R6,R7,R8,R9,Op0,Op1,Op2,Op3,Op4,Op5,Op6,Op7,Op8,Op9>(const thrust::THRUST_200400_860_NS::tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9> &, const thrust::THRUST_200400_860_NS::tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9> &, unsigned int, const thrust::THRUST_200400_860_NS::tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9> &)" failed deduction
__declspec(__device__) __forceinline void reduce(const thrust::tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem,
^
D:\repos\opencv\opencv\modules\core\include\opencv2/core/cuda/reduce.hpp(63): note #3327-D: candidate function template "cv::cuda::device::reduce<N,T,Op>(volatile T *, T &, unsigned int, const Op &)" failed deduction
__declspec(__device__) __forceinline void reduce(volatile T* smem, T& val, unsigned int tid, const Op& op)
^
detected during:
instantiation of "void cv::cuda::device::imgproc::FastNonLocalMeans<T>::convolve_window(int, int, const int *, T &) const [with T=uchar]" at line 472
instantiation of "void cv::cuda::device::imgproc::FastNonLocalMeans<T>::operator()(cv::cuda::PtrStepSz<T> &) const [with T=uchar]" at line 479
instantiation of "void cv::cuda::device::imgproc::fast_nlm_kernel(cv::cuda::device::imgproc::FastNonLocalMeans<T>, cv::cuda::PtrStepSz<T>) [with T=uchar]" at line 505
instantiation of "void cv::cuda::device::imgproc::nlm_fast_gpu<T>(const cv::cuda::PtrStepSzb &, cv::cuda::PtrStepSzb, cv::cuda::PtrStepi, int, int, float, cudaStream_t) [with T=uchar]" at line 511
D:\repos\opencv\opencv\modules\photo\src\cuda\nlm.cu(421): error: no instance of overloaded function "cv::cuda::device::reduce" matches the argument list
argument types are: (cuda::std::__4::tuple<volatile float *, volatile float *, volatile float *>, cuda::std::__4::tuple<float &, float &, float &>, const unsigned int, const cuda::std::__4::tuple<cv::cuda::device::plus<float>, cv::cuda::device::plus<float>, cv::cuda::device::plus<float>>)
reduce<CTA_SIZE>(Unroll<VecTraits<T>::cn>::template smem_tuple<CTA_SIZE>(cta_buffer),
@cudawarped Thanks for the trial. You also need corresponding patch in main repo: https://github.com/opencv/opencv/pull/25658
@cudawarped Thanks for the trial. You also need corresponding patch in main repo: opencv/opencv#25658
Building on Windows 11 for both CUDA 12.3 and 12.5 and passing all CUDA tests on RTX 3070 except DNN and previously failing ones (https://github.com/opencv/opencv_contrib/issues/3374).
Tries to fix https://github.com/opencv/opencv_contrib/issues/3690 for CUDA 12.4+ Related patch to main repo: https://github.com/opencv/opencv/pull/25658
Changes:
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request