opencv / opencv

Open Source Computer Vision Library
https://opencv.org
Apache License 2.0
75.95k stars 55.62k forks source link

CUDA backend for the DNN module #14827

Closed YashasSamaga closed 4 years ago

YashasSamaga commented 4 years ago

More up-to-date info available here (unofficial)


How to use build and use the CUDA backend?

How to use multiple GPUs?

There are many ways to make use of multiple GPUs. Here is one which I think is the safest and the least complex solution. It makes use of the fact that the CUDA runtime library maintains a separate CUDA context for each CPU thread.

Suppose you have N devices.

Create N threads.
Assign a CUDA device to each thread by calling cudaSetDevice or cv::cuda::setDevice in that thread. Each thread is now associated with a device.
You can create any number of cv::dnn::Net objects in any of those threads and the network will use the device associated with that thread for memory and computation.

Benchmarks

Demo Video: https://www.youtube.com/watch?v=ljCfluWYymM

Project summary/benchmarks: https://gist.github.com/YashasSamaga/a84cf2826ab2dc755005321fe17cd15d

Support Matrix for this PR ## Current Support Matrix: (not updated) Blip | Meaning ---- | --------- ✔️ | supports all the configurations that are supported by all the existing backends (and might support more than what's currently supported) 🔵 | partially supported (fallback to CPU for unsupported configurations) :x: | not supported (fallback to CPU) Layer | Status | Constraints | Notes ---------------------------------------- | ------ | ------------- | -------------- Activations | ✔️ Batch Normalization | ✔️ Blank Layer | ✔️ Concat Layer | ✔️ Const Layer | ✔️ Convolution 2d | ✔️ | | asymmetric padding is disabled in layer constructor but the backend supports it Convolution 3d | ✔️ | | asymmetric padding is disabled in layer constructor but the backend supports it Crop and resize | :x: | Crop Layer | ✔️ | | forwarded to Slice Layer Detection Output Layer | :x: | Deconvolution 2d | 🔵 | padding configuration should not lead to extra uneven padding Deconvolution 3d | 🔵 | padding configuration should not lead to extra uneven padding Elementwise Layers | ✔️ | Eltwise Layer | ✔️ | Flatten Layer | ✔️ | Fully Connected Layer | ✔️ | Input Layer | :x: | Interp Layer | ✔️ | Local Response Normalization | ✔️ | Max Unpooling 2d | ✔️ | Max Unpooling 3d | ✔️ | MVN Layer | :x: | Normalize Layer | 🔵 | Only L1 and L2 norm supported Padding Layer | ✔️ Permute Layer | ✔️ Pooling 2d | 🔵 | Only max and average pooling supported | supports asymmetric padding Pooling 3d | 🔵 | Only max and average pooling supported | supports asymmetric padding Prior Box Layer | ✔️ Proposal Layer | :x: Region Layer | ✔️ | NMS performed using CPU Reorg Layer | ✔️ | Reshape Layer | ✔️ | Resize Layer | ✔️ Scale Layer | ✔️ Shift Layer | ✔️ | | forwarded to Scale Layer Shuffle Channel Layer | ✔️ Slice Layer | ✔️ Softmax Layer | ✔️ Split Layer | ✔️ LSTM Layer | :x:

Known issues:

  1. Tests for some of the SSD based networks fail on Jetson Nano

References: #14585

Results:

force_builders_only=Custom,linux,docs
buildworker:Custom=linux-4
docker_image:Custom=ubuntu-cuda:18.04
YashasSamaga commented 4 years ago

Do I have to use CV_OVERRIDE and CV_FINAL? I preassume that they were added for portability but now since both final and override are keywords in C++11, should they be used?

Can I use std::shared_ptr instead of cv::Ptr? There isn't a make_shared equivalent and makePtr doesn't do what std::make_shared does.

Is it fine to force push occasionally when there isn't any dependent stuff like reviews in between?

alalek commented 4 years ago

CV_OVERRIDE and CV_FINAL

It is used to avoid excessive merge issues from 3.4 branch. As your code is in master branch only and this problem is not actual, so you can use C++ keywords/modifiers.

use std::shared_ptr instead of cv::Ptr

Feel free to use std::shared_ptr (but it is not supported by bindings generator, so be careful with public API).

makePtr doesn't do what std::make_shared does.

In master branch it is just a wrapper, so it should do the same things.

Is it fine to force push

It is OK. Also rebasing is preferred over "merge" commits (it is easy to do that using 1 squashed commit: squash first, then rebase).

davisking commented 4 years ago

Seems like it would be implementation defined at worst, rather than UB. You sure it’s UB? If it’s ok in c++17 and works in our case I think it’s fine. I would be surprised if some compilers defined std::iterator_traits<T>::iterator_category for non iterators in c++11.

applied-machinelearning commented 4 years ago

First the good (or great): I have done some tests and I can build opencv and successfully run both test and perf on my workstation with an "pascal" generation card (compute capability 6.1) and cuda 10.1 with gcc-8 as compiler. It does generate quite some warnings like:

In file included from /usr/local/cuda/include/cuda_fp16.h:2524:0, from /mnt/storage/src/opencv/opencv-cudadnn-buildtest/modules/dnn/src/cuda/activations.cu:6: /usr/local/cuda/include/cuda_fp16.hpp:279:6: warning: "__CUDA_ARCH__" is not defined, evaluates to 0 [-Wundef]

if (CUDA_ARCH >= 530 || !defined(CUDA_ARCH)) && !defined(__CUDA_NO_HALF2_OPERATORS__)

^~~~~

The included "cuda_fp16.h" and "cuda_fp16.hpp" have a lot of __CUDA_ARCH__ ifdef'ery around fp16 datatypes support for different compute capabilities. So somehow nvcc doesn't seem to have that macro defined while compiling the .cu files. However if i read the manual it states:

5.7.4. Virtual Architecture Identification Macro The architecture identification macro __CUDA_ARCH__ is assigned a three-digit value string xy0 (ending in a literal 0) during each nvcc compilation stage 1 that compiles for compute_xy. This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.

It looks likes it should have been defined, as this is Device and not host code ? However it doesn't seem to impact the build for this card generation / computer capability.

And now the somewhat less good: Trying to build on a nvidia jetson nano (compute capability 5.3) fails, nvidia currently only provides an image with cuda 10.0 cudnn 7.3.1 and gcc-7 as compiler. So i'm stuck on that for the moment.

I have attached the CMakeVars.txt and the complete build logs for both machines.

pascal: opencv-cudadnn-buildtest-pascal-gcc.txt CMakeVars.txt nano: opencv-cudadnn-buildtest-jetsonnano-gcc.txt CMakeVars.txt

YashasSamaga commented 4 years ago

@applied-machinelearning That's surprising. I have two tests failing on my PC (and I expect them to fail on any PC).

The warnings are emitted by the CUDA headers which are broken (I think so). That's what I can infer from this NVIDIA DevTalk post.

The PR uses tensor transform API for adding asymmetric padding (required for same padding mode) in the convolution layer. This API was added in cuDNN starting from version 7.5.0 (Release Notes). Hence, the build is failing on nano.

The minimum version of cuDNN required is 7.5.0. The CMake doesn't reflect this yet [TODO].

@davisking what should be the minimum version of cuDNN supported? The latest version is 7.6.2 and the current minimum is 7.5.0.

@alalek Is it possible to run the accuracy and performance tests on CI? The cuda build doesn't seem to run the tests.

There are two warnings from an unrelated module:

/build/precommit_custom_linux/opencv_contrib/modules/cudalegacy/src/cuda/NCVBroxOpticalFlow.cu(1124): warning: variable "p_threads" was declared but never referenced

/build/precommit_custom_linux/opencv_contrib/modules/cudalegacy/src/cuda/NCVBroxOpticalFlow.cu(958): warning: variable "dThreads" was declared but never referenced
applied-machinelearning commented 4 years ago

@applied-machinelearning That's surprising. I have two tests failing on my PC (and I expect them to fail on any PC).

Sorry for not being precise, there are some individual tests failing, but the test and perf scripts finish without segmentation faults and other grave errors.

The minimum version of cuDNN required is 7.5.0. The CMake doesn't reflect this yet [TODO].

Thank you for clearing that up, although it's a pity since at the moment that would rule out the Jetson embedded boards for which Nvidia also doesn't support openCL. Unfortunately Nvidia doesn't provide individual downloads/releases for aarch64 for it's libraries. Hopefully they will release new images with cuda 10.1 and an updated cuDNN anytime soon.

alalek commented 4 years ago

@YashasSamaga Great progress!

Is it possible to run the accuracy and performance tests on CI

No, we don't have CUDA GPUs in CI. One of the problems is prohibition of NVIDIA drivers installation for normal GPUs:

two warnings from an unrelated module

just ignore them (or feel free to prepare separate PR which suppresses/eliminates these warnings, like #15267)

davisking commented 4 years ago

I think requiring 7.5.0 is fine.

YashasSamaga commented 4 years ago

NOTE: the devices used in the test are low-end mobile devices

CPU: i7 7700HQ GPU: NVIDIA GTX 1050 Mobile

BLAS Library: MKL 2019.0.4 CUDA Version: 10.1 cuDNN: 7.6.2

Warmup Runs: 3 (forward pass is performed three times before benchmarks) Benchmark Runs: 10 (the average of ten forward passes is reported)

Test Code: https://gist.github.com/YashasSamaga/71157cf0c3768c497e5e70fb95435596

Notes:

Backend Comparision

Batch Size = 1

Model CUDA FP32 Inference Engine CPU OpenCV CPU
GoogLeNet 7.2447ms 10.4981ms 17.9176ms
DenseNet121 12.6324ms 19.1823ms 48.0628ms
EAST Text Detection 18.8281ms 49.0508ms 88.9429ms
ENet 11.5014ms Exception 62.5854ms
FastNeuralStyle StaryNight 27.498ms 178.309ms 160.359ms
Inception 5h 7.8546ms 22.2789ms 20.3255ms
Inception v2 FasterRCNN 112.736ms Exception 374.26ms
MobileNet SSD 58.4751ms 9.2896ms 27.3061ms
OpenCV Face Detector 6.9831ms 8.3981ms 17.6683ms
OpenPose Pose MPI 160.561ms 509.446ms 838.161ms
Resnet 50 11.3603ms 28.1529ms 50.2752ms
SqueezeNet 2.4084ms 3.2918ms 5.476ms
VGG16 SSD 70.4117ms 249.725ms 360.207ms
Yolo v3 57.9822ms 214.629ms 296.806ms
Yolo v2 51.5784ms 193.453ms 260.19ms

Batch Size = 10

Model CUDA FP32 Inference Engine CPU OpenCV CPU
GoogLeNet 35.7556ms 108.946ms 225.928ms
DenseNet121 74.9241ms 295.105ms 650.924ms
EAST Text Detection 149.58ms 536.946ms 1273.93ms
FastNeuralStyle StaryNight 283.173ms 1966.5ms 2175.3ms
Inception 5h 36.6225ms 180.429ms 233.276ms
MobileNet SSD 277.753ms 111.872ms 316.063ms
OpenCV Face Detector 52.4366ms 95.7866ms 202.657ms
OpenPose Pose MPI 628.617ms 5650.05ms 10683.5ms
Resnet 50 74.283ms 230.817ms 541.308ms
SqueezeNet 15.8144ms 35.4915ms 69.4122ms
VGG16 SSD 594.286ms 2796.23ms 4661.51ms
Yolo v3 488.704ms 2419.8ms 4209.74ms
Yolo v2 491.414ms 2185.47ms 3788.34ms

OpenCV OpenCL vs CUDA FP32 (on NVIDIA GPU)

Batch Size = 1

Model CUDA FP32 OpenCV OpenCL NVIDIA
GoogLeNet 7.5951ms 56.218ms
DenseNet121 12.9375ms 110.564ms
EAST Text Detection 19.1325ms 309.341ms
ENet 11.8922ms 38.8476ms
FastNeuralStyle StaryNight 29.69ms 346.566ms
Inception 5h 8.8545ms 57.4015ms
Inception v2 FasterRCNN 114.535ms 2244.24ms
MobileNet SSD 57.6893ms 148.459ms
OpenCV Face Detector 6.9666ms 59.9923ms
OpenPose Pose MPI 162.01ms 2377.14ms
Resnet 50 11.9307ms 176.066ms
SqueezeNet 2.4413ms 14.6637ms
VGG16 SSD 70.8822ms 1288.96ms
Yolo v3 58.133ms 1168.71ms
Yolo v2 53.5697ms 1016.73ms

Integrated Graphics (IG) vs NVIDIA (using OpenCV backend's OpenCL target)

Batch Size = 1

Model OpenCV OpenCL IG OpenCV OpenCL FP16 IG OpenCV OpenCL NVIDIA
GoogLeNet 15.5681ms 11.7769ms 56.218ms
DenseNet121 49.4344ms 56.3869ms 110.564ms
EAST Text Detection 86.0381ms 80.949ms 309.341ms
ENet 27.4152ms Exception 38.8476ms
FastNeuralStyle StaryNight 105.712ms 132.263ms 346.566ms
Inception 5h 17.4537ms 14.6988ms 57.4015ms
Inception v2 FasterRCNN 358.018ms 374.585ms 2244.24ms
MobileNet SSD 20.5701ms 21.3236ms 148.459ms
OpenCV Face Detector 21.3481ms 26.4779ms 59.9923ms
OpenPose Pose MPI 888.518ms 870.852ms 2377.14ms
Resnet 50 33.1333ms 25.5099ms 176.066ms
SqueezeNet 5.877ms 5.865ms 14.6637ms
VGG16 SSD 425.423ms 353.651ms 1288.96ms
Yolo v3 339.913ms 338.573ms 1168.71ms
Yolo v2 446.899ms 314.487ms 1016.73ms
applied-machinelearning commented 4 years ago

After I found an updated image for the jetson nano (cuda 10, cudnn 7.5.0), i have been able to compile opencv with cudadnn.

Unfortunately most tests fail with:

unknown file: Failure C++ exception with description "OpenCV(4.1.1-dev) /mnt/storage/src/opencv/opencv-cudadnn-buildtest/modules/dnn/src/cuda4dnn/csl/memory.hpp:263: error: (-217:Gpu API call) operation not supported in function 'MemoryLockGuard' " thrown in the test body.

I have seen this exception also on my x86 / GTX 1060 when trying to run some python scripts (some work, some fail with this error).

CMakeVars.txt buildlog.txt opencv-dnn-test-log.txt

BTW: Very impressive benchmark results you posted !

YashasSamaga commented 4 years ago

This has something to do with Jetson not supporting the ability to page-lock already allocated memory.

https://devtalk.nvidia.com/default/topic/1032259/jetson-tx2/cudaerrornotsupported-when-calling-cv-cuda-cudahostregister-on-nvidia-tx2/post/5256704/

MemoryLockGuard essentially page-locks host memory. This boosts the host to device memory transfer bandwidth (almost doubles on my PC) and also allows the transfer to happen asynchronously.

This also improves inference time when the network has layers which do not have CUDA implementations. It may be insignficant for large networks though.

But yes, it doesn't seem nice to rule out all the Jetson devices. I am currently thinking of adding a build option or maybe a runtime-option (disabled by default) to prevent page-locking.

An alternate solution is to allocate page-locked memory during allocation instead of having to later lock it. But this is mostly not possible as the host memory is allocated by the DNN backbone code which is independent of the CUDA backend.


Yes, there is a test failing on my x86 PC due to MemoryLockGuard for attempting to lock memory which is already locked. I will be rolling out a fix for this soon. I am speculating that this is the error you are also facing.

Do you know what situation causes the error? What's different in the codes that work and that don't?

applied-machinelearning commented 4 years ago

But yes, it doesn't seem nice to rule out all the Jetson devices. I am currently thinking of adding a build option or maybe a runtime-option (disabled by default) to prevent page-locking.

Could perhaps be auto enabled build option on Arm architecture as the referenced post seems to suggest it is a problem on Arm only ?

An alternate solution is to allocate page-locked memory during allocation instead of having to later lock it. But this is mostly not possible as the host memory is allocated by the DNN backbone code which is independent of the CUDA backend.

Yes, there is a test failing on my x86 PC due to MemoryLockGuard for attempting to lock memory which is already locked. I will be rolling out a fix for this soon. I am speculating that this is the error you are also facing.

Do you know what situation causes the error? What's different in the codes that work and that don't?

No unfortunately I haven't been able to figure it out. All the hunches I had, I incorporated in the simple test script, but that doesn't fail.

YashasSamaga commented 4 years ago

I won't be adding new code to the PR anymore. The commits from now on will be limited to bug fixes and refactoring. I will create separate PRs and improve upon this PR in the coming months.

@alalek @dkurt PR is ready for review.

applied-machinelearning commented 4 years ago

Got some test and perf results for the jetson nano. opencv-dnn-perf-log.txt opencv-dnn-test-log.txt

YashasSamaga commented 4 years ago

The Deconvolution3D test is failing because the DNN backbone code has attempted to allocate an internal blob with zero size which has been caught by an assertion in the CUDA backend. If I remember correctly, this is a recent issue which wasn't happening until I rebased a week ago.

https://github.com/opencv/opencv/blob/fa55dc8ded28e0261c3c3370170b0155e22c429b/modules/dnn/src/dnn.cpp#L1925

/cc @alalek I am not sure if this is a bug. Deconvolution3D appears to request a zero sized internal blob.

[TODO BUG] The CUDA backend has it's own system of managing internal blobs and hence I shouldn't even be allocating from these internal blobs. I'll skip the wrap() call for CUDA backend. This should free up some (maybe significant) precious GPU memory.

The SSD networks aren't failing on my PC. The outputs certainly look wrong though. Can you try running on the tests on your GTX 1060 and check if they fail? @applied-machinelearning

applied-machinelearning commented 4 years ago

The SSD networks aren't failing on my PC. The outputs certainly look wrong though. Can you try running on the tests on your GTX 1060 and check if they fail? @applied-machinelearning

You are correct, they aren't failing on my other machine. opencv-dnn-test-log-pascal.txt

Main differences are (jetson vs my workstation with pascal card): cuda: 10.0 vs 10.1 cudnn: 7.5.0 vs 7.6.2 compute capability: 5.3 vs 6.0 / 6.1 (Maxwell vs Pascal, note that normal maxwell is 5.0 / 5.2, the jetson nano's 5.3 seems to have different handling around the FP16 stuff) arch: aarch64 vs x86 (thinking out loud: issue with something like endianness in handing over to cpu layers?)

Another thing i noticed: While I tried to compile with compute capability 5.0 / 5.2 on my workstation (trying to rule out if the compute capability has anything to do with the above problem). But that fails on the __half stuff which seems to be only necessary for the CUDA_FP16 stuff. If I remember correctly CUDA_FP16 only makes sense performance wise for later cards, so perhaps the whole CUDA_FP16 should depend on compute capability being at least pascal ? That would probably make all these compile errors go away. See the build log: buildlog.txt

applied-machinelearning commented 4 years ago

Good news, commit e4e6759, seems to have fixed the last failing test: Test_ONNX_layers.Deconvolution3D/0, where GetParam() = CUDA/CUDA

I have done some more digging around with the python problem. After commit "ignore memory lock failures" it now fails on: cv2.error: OpenCV(4.1.1-dev) /mnt/storage/opencv/opencv-cudadnn/modules/dnn/src/cuda4dnn/csl/memory.hpp:54: error: (-217:Gpu API call) initialization error in function 'ManagedPtr'.

So I intstrumented that with cudaMemGetInfo and the requested malloc size. When using the simple python script it functions properly:

cudaMemGetInfo free: 6103695360 total: 6373179392 Malloc size: 2076672

But when trying the more involved script it fails:

cudaMemGetInfo free: 0 total: 0 Malloc size: 2076672

So it seems cuda isn't initialized properly. That is probably due to the more involved script being multiprocess / multithreaded. I will see if I can make a minimal python script that exhibits the problem.

rcaltin commented 4 years ago

Hello, thank you for your effords.

I got successful tests on platforms above with my 300x300px trained Inception v2 SSD tensorflow model:

Test Platform # 1 : GTX-1050-TI & i7-7700HQ & Win10 : ~22 fps on CPU (DNN_BACKEND_OPENCV & DNN_TARGET_CPU), ~49 fps on CPU with IE (DNN_BACKEND_INFERENCE_ENGINE & DNN_TARGET_CPU), ~46 fps on CUDA GPU (DNN_BACKEND_CUDA & DNN_TARGET_CUDA)

Test Platform # 2 : GTX-1050-Mobile & i5-8300H & Win10 : ~9 fps on CPU (DNN_BACKEND_OPENCV & DNN_TARGET_CPU), ~21 fps on CPU with IE (DNN_BACKEND_INFERENCE_ENGINE & DNN_TARGET_CPU), ~41 fps on CUDA GPU (DNN_BACKEND_CUDA & DNN_TARGET_CUDA)

catree commented 4 years ago

@YashasSamaga Awesome work!

Any idea why CUDA MobileNet SSD performs badly? Probably some missing layers in the CUDA backend?

What could be the reasons of the bad performance of OpenCL backend on Nvidia? It performs worse than OpenCL on Intel IG.

Is it because the OpenCL drivers on Nvidia are suboptimal? Is it because the kernels are tuned for Intel IG? A combination of both maybe?

YashasSamaga commented 4 years ago

Any idea why CUDA MobileNet SSD performs badly? Probably some missing layers in the CUDA backend?

cuDNN performs very poorly for depthwise convolutions. It launches thousands of kernels. Hopefully, this will be fixed in a future version of cuDNN.

The only missing layer is DetectionOutputLayer which doesn't take a toll on the performance as it appears at the end of the network. In fact, it's generally faster on the CPU than on a CUDA device (it's also partly because of my inability to write a kernel which can outperform the CPU).

What could be the reasons of the bad performance of OpenCL backend on Nvidia? It performs worse than OpenCL on Intel IG. Is it because the OpenCL drivers on Nvidia are suboptimal? Is it because the kernels are tuned for Intel IG? A combination of both maybe?

The OpenCV backend's OCL implementation frequently uses the CPU target as a fallback. This target switch is very cheap as the IG and CPU share the same memory.

NVIDIA devices have their own dedicated graphics memory. Every time a fallback is used, you'll have to transfer the memory from the device to the host. This is very costly. So costly that any benefits that are to be gained are completely outweighed by the cost of the intermediate memory transfers.

OpenCL limits the ability to exploit the full capability of CUDA devices. It offers far less control than what a CUDA backend could have.

YashasSamaga commented 4 years ago

GTX 1080 Ti Benchmark

CPU: 2x Intel Xeon E5-2640 v4 (40 logical cores) GPU: 1x NVIDIA GTX 1080 Ti (11 GB)

CUDA Version: 10.0 cuDNN: 7.6.2

Warmup Runs: 3 (forward pass is performed three times before benchmarks) Benchmark Runs: 10 (the average of ten forward passes is reported)

Test Code: https://gist.github.com/YashasSamaga/71157cf0c3768c497e5e70fb95435596

Backend Comparision

Batch Size = 1

Model CUDA FP32 OpenCV CPU
GoogLeNet 4.8824ms 14.2981ms
DenseNet121 6.4555ms 57.8244ms
EAST Text Detection 5.901ms 67.4301ms
ENet 4.5979ms 30.2767ms
FastNeuralStyle StaryNight 5.3193ms 51.3313ms
Inception 5h 4.9487ms 16.0048ms
Inception v2 FasterRCNN 82.0298ms 179.245ms
MobileNet SSD 70.9177ms 23.9348ms
OpenCV Face Detector 4.9288ms 15.4205ms
OpenPose Pose MPI 30.5954ms 246.747ms
Resnet 50 4.5968ms 45.1153ms
SqueezeNet 1.0888ms 3.6492ms
VGG16 SSD 23.5926ms 194.976ms
Yolo v3 18.0002ms 141.861ms
Yolo v2 12.1279ms 111.642ms

Batch Size = 10

Model CUDA FP32 OpenCV CPU
GoogLeNet 10.149ms 75.9591ms
DenseNet121 20.269ms 312.426ms
EAST Text Detection 32.1556ms 402.16ms
FastNeuralStyle StaryNight 49.1025ms 461.095ms
Inception 5h 9.9721ms 67.9308ms
MobileNet SSD 96.2898ms 110.783ms
OpenCV Face Detector 22.7501ms 77.8742ms
OpenPose Pose MPI 118.858ms 2321.89ms
Resnet 50 18.4139ms 229.599ms
SqueezeNet 4.4893ms 22.3049ms
VGG16 SSD 194.181ms 1319.67ms
Yolo v3 122.603ms 1044.11ms
Yolo v2 104.072ms 819.177ms

Batch Size = 128

Model CUDA FP32 OpenCV CPU
GoogLeNet 90.3755ms 775.769ms
DenseNet121 199.516ms 3536.38ms
EAST Text Detection 376.458ms 7685.72ms
FastNeuralStyle StaryNight 801.778ms 6607.15ms
Inception 5h 93.4188ms 771.575ms
MobileNet SSD 1028.93ms 1110.37ms
OpenCV Face Detector 276.992ms 977.997ms
OpenPose Pose MPI 1279.26ms 32159.3ms
Resnet 50 200.789ms 1719.92ms
SqueezeNet 55.6244ms 255.397ms
VGG16 SSD 2969.05ms 17201ms
Yolo v3 1564.78ms 13699.2ms
Yolo v2 1362.84ms 11254.9ms

Images processed per second (CUDA FP32)

Model batch size = 1 batch size = 10 batch size = 128
GoogLeNet 204 985 1416
DenseNet121 154 493 641
EAST Text Detection 169 311 340
ENet 217 Not Applicable Not Applicable
FastNeuralStyle StaryNight 188 204 160
Inception 5h 202 1002 1370
Inception v2 FasterRCNN 12 Not Aplicable Not Applicable
MobileNet SSD 14 104 124
OpenCV Face Detector 202 440 462
OpenPose Pose MPI 33 84 100
Resnet 50 217 540 637
SqueezeNet 918 2228 2301
VGG16 SSD 42 52 43
Yolo v3 55 82 81
Yolo v2 82 96 93

OpenCV CUDA vs TensorFlow

Batch Size 1

Model CUDA FP32 TensorFlow
ResNet-50 4.5968ms 7.1163ms
EAST Text Detection 5.901ms 8.6890ms

Batch Size 10

Model CUDA FP32 TensorFlow
ResNet-50 18.4139ms 22.3665ms
EAST Text Detection 32.1556ms 39.4857ms

Batch Size 128

Model CUDA FP32 TensorFlow
ResNet-50 200.789ms 216.3923ms
EAST Text Detection 376.458ms 421.8292ms
applied-machinelearning commented 4 years ago

@YashasSamaga Here is a simple python script that shows initialization problems with CUDA and multiprocessing in python. I have no idea if it works with c++ etc.

It also causes an error when setting the backend to CPU in the other process.

https://gist.github.com/applied-machinelearning/9462e1368065fd7bf93334b0130a6ba0

Starting detection on CPU from mainprocess Results of detection on CPU from mainprocess [[0.01141005 0.00488871 0.02028161 ... 0. 0. 0. ] ... [0.98707837 0.98927605 0.13942897 ... 0. 0. 0. ]]

Starting detection on CUDA from mainprocess Results of detection on CUDA from mainprocess [[0.01141005 0.00488872 0.02028161 ... 0. 0. 0. ] ... [0.98707837 0.98927605 0.13942908 ... 0. 0. 0. ]]

Starting detection on CPU from multiprocessing terminate called after throwing an instance of 'cv::dnn::cuda4dnn::csl::cublas::cuBLASException' what(): OpenCV(4.1.1-dev) /mnt/storage/opencv/opencv-cudadnn-really-working/modules/dnn/src/cuda4dnn/csl/cublas.hpp:63: error: (-217:Gpu API call) CUBLAS_STATUS_NOT_INITIALIZED in function 'UniqueHandle'

Starting detection on CUDA from multiprocessing terminate called after throwing an instance of 'cv::dnn::cuda4dnn::csl::cublas::cuBLASException' what(): OpenCV(4.1.1-dev) /mnt/storage/opencv/opencv-cudadnn-really-working/modules/dnn/src/cuda4dnn/csl/cublas.hpp:63: error: (-217:Gpu API call) CUBLAS_STATUS_NOT_INITIALIZED in function 'UniqueHandle'

alalek commented 4 years ago

multiprocessing in python

Perhaps we need to block fork() calls from Python: #5150 multiprocessing.set_start_method('spawn') should help.

applied-machinelearning commented 4 years ago

@alalek Just tested and that works for me !

Any ideas on: https://github.com/opencv/opencv/pull/14827#issuecomment-522737374 ?

YashasSamaga commented 4 years ago

@applied-machinelearning I have tested on several mobile and desktop GPUs but have not been able to reproduce the failures. It might be specific to Jetson but I don't have access to one (at least for now).

Can you try recloning (or maybe a hard reset?) repositories (opencv_extra and my fork) and run the tests again?

EDIT: I have verified with someone else. The tests are failing on Jetson Nano (with CUDA 10.0 and cuDNN 7.5.0).

The issue is not with CUDA 10.0 or CUDA 10.1 as I have tested both on desktop GPUs. I haven't used cuDNN other than 7.6.2. So it's either cuDNN or something wrong in my code which breaks only in Jetson Nano.

spacemud commented 4 years ago

I also did some testing with the Jetson Nano; here are my results.

dnn_tests.txt dnn_perf.txt

It appears that the SSD networks are failing in the tests.

applied-machinelearning commented 4 years ago

@YashasSamaga I retested with pristine cloned trees and downloads and still got these failing tests.

[ FAILED ] DNNTestNetwork.MobileNet_SSD_v2_TensorFlow/0, where GetParam() = CUDA/CUDA [ FAILED ] DNNTestNetwork.SSD_VGG16/0, where GetParam() = CUDA/CUDA [ FAILED ] DNNTestNetwork.Inception_v2_SSD_TensorFlow/0, where GetParam() = CUDA/CUDA [ FAILED ] Test_TensorFlow_nets.Inception_v2_SSD/0, where GetParam() = CUDA/CUDA

I previously tried to compile for maxwell generation on my workstation (with pascal card) but that failed on fp16 support lacking with sm_50 and sm_52, but i forgot to test with sm_53. Now i did and that compiles. So with current code and no way to disable FP16 support, sm_53 is the lowest compute capability that compiles. I haven't checked yet what this means for the other (older) jetson boards. EDIT: just checked, Jetson TK1 = 3.2, jetson TX1 == maxwell (5.3), jetson TX2 == pascal, Jetson Xavier == volta. So only the old TK1 would be a problem on the embedded side of things. So either separating and making optional of FP16 support or just requiring sm_53 in CMake would seem sensible ?

Running the tests with opencv compiled for only sm_53 on the pascal card on X86 (cuda 10.1 cudnn 7.6.2) gives no failing tests, so the compute capability doesn't seem to matter.

So it's either something in cudnn 7.5.0, nvidia CUDA / cudnn libraries on arm64, or something in the opencv cudadnn code on arm64. Which makes me wonder, what is so special about these failed models that they fail and all other models and test succeed on the Jetson Nano ? Any special layers ? Any special layers which are CPU only ?

tompollok commented 4 years ago

@YashasSamaga thank you for your great contribution! May I ask if you know if MaskRCNN will also work using your CUDA backend? I works using OpenCV backend. Will your CUDA/CUDNN backend autofallback layers on other backend if they arend supported? I added the pbtxt if youre interested what layers the network uses:

mask_rcnn_inception_v2_coco_2018_01_28.zip

YashasSamaga commented 4 years ago

@tompollok

The CUDA backend uses OpenCV CPU backend as a fallback for unsupported layers or layer configurations. The fallbacks are quite costly though.

I used your .pbtxt and the .pb from here.

Inception v2 Mask RCNN
OCV CPU Time:   3280ms
CUDA Total Time: 407ms
Relative Error >> Total: 0.787171, Average: 1.94363e-07, Max: 1.65757e-06

7700HQ and GTX 1050 were the devices used in the test. Every output value from the CUDA backed is compared against the corresponding output from the OpenCV CPU backend. The relative error is calculated as:

error = abs(x - y) / max(max(abs(x), abs(y)), eps) where eps = 1e-7

If you use the CUDA backend in a debug build, it will print the layers for which fallback is used. The CUDA backend uses OCV CPU fallback for the following layers:

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "_input" of type __NetInputLayer__

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "detection_out" of type DetectionOutput

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "CropAndResize" of type CropAndResize

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "detection_out_final" of type DetectionOutput

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "CropAndResize_1" of type CropAndResize

I will be adding CropAndResize layer and NetInputLayer soon which should bring down the inference time considerably. Adding support for the detection output layer is a bit tricky because of NMS. I have tried but haven't been able to beat the CPU version of NMS. I'll hopefully get it working someday.

applied-machinelearning commented 4 years ago

If you use the CUDA backend in a debug build, it will print the layers for which fallback is used.

I think that info would also be helpful for non-debug builds (users of released distro versions). Could it be an idea to amend the id of the backend used to getPerfProfile() (or have it's own function to get that info per layer) ?

alalek commented 4 years ago

info would also be helpful for non-debug builds

Environment variable OPENCV_LOG_LEVEL=INFO should help with messages in release builds.

tompollok commented 4 years ago

@tompollok

The CUDA backend uses OpenCV CPU backend as a fallback for unsupported layers or layer configurations. The fallbacks are quite costly though.

I used your .pbtxt and the .pb from here.

Inception v2 Mask RCNN
OCV CPU Time:   3280ms
CUDA Total Time: 407ms
Relative Error >> Total: 0.787171, Average: 1.94363e-07, Max: 1.65757e-06

7700HQ and GTX 1050 were the devices used in the test. Every output value from the CUDA backed is compared against the corresponding output from the OpenCV CPU backend. The relative error is calculated as:

error = abs(x - y) / max(max(abs(x), abs(y)), eps) where eps = 1e-7

If you use the CUDA backend in a debug build, it will print the layers for which fallback is used. The CUDA backend uses OCV CPU fallback for the following layers:

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "_input" of type __NetInputLayer__

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "detection_out" of type DetectionOutput

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "CropAndResize" of type CropAndResize

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "detection_out_final" of type DetectionOutput

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "CropAndResize_1" of type CropAndResize

I will be adding CropAndResize layer and NetInputLayer soon which should bring down the inference time considerably. Adding support for the detection output layer is a bit tricky because of NMS. I have tried but haven't been able to beat the CPU version of NMS. I'll hopefully get it working someday.

Thats great news! Is there a way to list the single or average or accumulated forwording time in ms per layer for when forwording to see where bottlenecks may occur?

YashasSamaga commented 4 years ago

Currently, there is no simple way. The numbers returned by getPerfProfile are not accurate for CUDA backend.

The backend has NVTX integration which allows you to compute the timings for each layer using NVIDIA's profiling tools. The NVTX integration marks regions of layers which allows you to exactly identify the time taken by the layers in NVIDIA's profiling tools.

The NVTX integration can be enabled by adding CUDA4DNN_ENABLE_NVTX preprocessor symbol while building the DNN module.

The timings can be computed for the layers within the code by adding events at the beginning and the end of every layer. This would allow getPerfProfile to return accurate timings. I need to investigate the performance impacts (mostly negligible) of using CUDA's event API for timing and then decide what to do.

YashasSamaga commented 4 years ago

forwardAsync for CUDA backend does not do what it does for IE.

For the CUDA backend, it dumps the operations to the device and returns immediately so that the calling thread can continue. It's not possible to call forwardAsync until the previous operations finish.

This overloads the meaning for forwardAsync. Any user can mimic this forwardAsync behaviour of CUDA backend on their own so I don't see why it should be a part of the CUDA backend.

I think I should revert https://github.com/opencv/opencv/pull/14827/commits/1154b9da9da07e9b52f8a81bdcea48cf31c56f70 or make it behave like IE forwardAsync (which isn't trivial).

davisking commented 4 years ago

forwardAsync for CUDA backend does not do what it does for IE.

For the CUDA backend, it dumps the operations to the device and returns immediately so that the calling thread can continue. It's not possible to call forwardAsync until the previous operations finish.

The calling thread can call forwardAsync again, but you mean that if they do that they will get the wrong results right?

This overloads the meaning for forwardAsync. Any user can mimic this forwardAsync behaviour of CUDA backend on their own so I don't see why it should be a part of the CUDA backend.

I think I should revert [1154b9d]

Assuming the above is all correct, reverting this and making forwardAsync just not supported for the CUDA backend seems like the right thing to do, since it's important for methods that claim to implement an interface (i.e. forwardAsync) to all actually conform to interface's contract.

YashasSamaga commented 4 years ago

@davisking Yes, if the calling thread calls forwardAsync before the previous request has been completed, they will get wrong results.


getPerfProfile()

I have tried using events to allow getPerfProfile to report timings accurately. AlexNet benchmark time shot up by 1ms (6ms from 5ms) when events were added at the beginning and the end of every layer. Hence, I think it's not a good idea to add it unless there is a way to optionally enable/disable profiling in the cv::Net interface.


Why is MobileNet slow?

CPU: 7700HQ GPU: GTX 1050 Mobile

The timings are in milliseconds.

Click for layerwise timings LID | Layer Name | OCV CPU Time | CUDA Time --- | --------------------------| ------------ | -------------- 1 | conv0 | 0.7377 | 0.431744 2 | conv0/relu | fused | 0.111616 3 | conv1/dw | 3.6471 | 4.87014 4 | conv1/dw/relu | fused | 0.200704 5 | conv1 | 1.3591 | 0.4352 6 | conv1/relu | fused | 0.293888 7 | conv2/dw | 2.0382 | 1.29843 8 | conv2/dw/relu | fused | 0.169984 9 | conv2 | 0.8757 | 0.3072 10 | conv2/relu | fused | 0.193536 11 | conv3/dw | 3.6779 | 5.92691 12 | conv3/dw/relu | fused | 0.224256 13 | conv3 | 1.7081 | 0.396288 14 | conv3/relu | fused | 0.149504 15 | conv4/dw | 1.3985 | 1.60563 16 | conv4/dw/relu | fused | 0.151552 17 | conv4 | 1.0596 | 0.275456 18 | conv4/relu | fused | 0.164864 19 | conv5/dw | 2.2595 | 4.86093 20 | conv5/dw/relu | fused | 0.036864 21 | conv5 | 1.5585 | 0.229376 22 | conv5/relu | fused | 0.16384 23 | conv6/dw | 0.6994 | 2.93581 24 | conv6/dw/relu | fused | 0.01536 25 | conv6 | 0.4596 | 0.14336 26 | conv6/relu | fused | 0.13312 27 | conv7/dw | 1.3262 | 7.89914 28 | conv7/dw/relu | fused | 0.027648 29 | conv7 | 1.6013 | 0.254976 30 | conv7/relu | fused | 0.08272 31 | conv8/dw | 1.3864 | 8.4992 32 | conv8/dw/relu | fused | 0.027392 33 | conv8 | 1.6535 | 0.248832 34 | conv8/relu | fused | 0.140288 35 | conv9/dw | 1.2779 | 8.38554 36 | conv9/dw/relu | fused | 0.103424 37 | conv9 | 1.8505 | 0.320512 38 | conv9/relu | fused | 0.105472 39 | conv10/dw | 1.323 | 9.472 40 | conv10/dw/relu | fused | 0.123904 41 | conv10 | 1.7355 | 0.325632 42 | conv10/relu | fused | 0.116736 43 | conv11/dw | 1.5615 | 11.5087 44 | conv11/dw/relu | fused | 0.032768 45 | conv11 | 1.6546 | 0.240672 46 | conv11/relu | fused | 0.090112 47 | conv12/dw | 0.4947 | 11.1565 48 | conv12/dw/relu | fused | 0.036864 49 | conv12 | 1.1583 | 0.305152 50 | conv12/relu | fused | 0.088064 51 | conv13/dw | 0.9613 | 20.4667 52 | conv13/dw/relu | fused | 0.096256 53 | conv13 | 2.6287 | 0.557056 54 | conv13/relu | fused | 0.104448 55 | conv14_1 | 0.8994 | 0.310272 56 | conv14_1/relu | fused | 0.116736 57 | conv14_2 | 1.1877 | 0.355328 58 | conv14_2/relu | fused | 0.115712 59 | conv15_1 | 0.1326 | 0.208896 60 | conv15_1/relu | fused | 0.114688 61 | conv15_2 | 0.2258 | 0.267264 62 | conv15_2/relu | fused | 0.064512 63 | conv16_1 | 0.0521 | 0.186368 64 | conv16_1/relu | fused | 0.07168 65 | conv16_2 | 0.1443 | 0.229376 66 | conv16_2/relu | fused | 0.116736 67 | conv17_1 | 0.0238 | 0.216064 68 | conv17_1/relu | fused | 0.113664 69 | conv17_2 | 0.0444 | 0.128 70 | conv17_2/relu | fused | 0.116736 71 | conv11_mbox_loc | 0.1258 | 0.201728 72 | conv11_mbox_loc_perm | 0.0086 | 0.109568 73 | conv11_mbox_loc_flat | 0.003 | 0.094208 74 | conv11_mbox_conf | 0.3093 | 0.2048 75 | conv11_mbox_conf_perm | 0.0136 | 0.095232 76 | conv11_mbox_conf_flat | 0.002 | 0.101376 77 | conv11_mbox_priorbox | 0.0167 | 0.089088 78 | conv13_mbox_loc | 0.1002 | 0.265216 79 | conv13_mbox_loc_perm | 0.0061 | 0.110592 80 | conv13_mbox_loc_flat | 0.0016 | 0.070656 81 | conv13_mbox_conf | 0.3808 | 0.236544 82 | conv13_mbox_conf_perm | 0.0113 | 0.053248 83 | conv13_mbox_conf_flat | 0.0021 | 0.093184 84 | conv13_mbox_priorbox | 0.0102 | 0.101376 85 | conv14_2_mbox_loc | 0.0339 | 0.144384 86 | conv14_2_mbox_loc_perm | 0.0055 | 0.055296 87 | conv14_2_mbox_loc_flat | 0.0014 | 0.057344 88 | conv14_2_mbox_conf | 0.0987 | 0.147456 89 | conv14_2_mbox_conf_perm | 0.0068 | 0.054272 90 | conv14_2_mbox_conf_flat | 0.0016 | 0.057344 91 | conv14_2_mbox_priorbox | 0.0043 | 0.03376 92 | conv15_2_mbox_loc | 0.0194 | 0.079872 93 | conv15_2_mbox_loc_perm | 0.0036 | 0.05216 94 | conv15_2_mbox_loc_flat | 0.0012 | 0.06144 95 | conv15_2_mbox_conf | 0.0422 | 0.094208 96 | conv15_2_mbox_conf_perm | 0.0079 | 0.050176 97 | conv15_2_mbox_conf_flat | 0.0038 | 0.060768 98 | conv15_2_mbox_priorbox | 0.0053 | 0.065312 99 | conv16_2_mbox_loc | 0.0185 | 0.11776 100 | conv16_2_mbox_loc_perm | 0.0075 | 0.06144 101 | conv16_2_mbox_loc_flat | 0.0021 | 0.06656 102 | conv16_2_mbox_conf | 0.0321 | 0.117504 103 | conv16_2_mbox_conf_perm | 0.0066 | 0.050176 104 | conv16_2_mbox_conf_flat | 0.0019 | 0.048 105 | conv16_2_mbox_priorbox | 0.0037 | 0.055232 106 | conv17_2_mbox_loc | 0.0109 | 0.128 107 | conv17_2_mbox_loc_perm | 0.0055 | 0.059072 108 | conv17_2_mbox_loc_flat | 0.0016 | 0.090112 109 | conv17_2_mbox_conf | 0.0184 | 0.131072 110 | conv17_2_mbox_conf_perm | 0.0048 | 0.098304 111 | conv17_2_mbox_conf_flat | 0.0018 | 0.078848 112 | conv17_2_mbox_priorbox | 0.0027 | 0.075648 113 | mbox_loc | 0.0135 | 0.094112 114 | mbox_conf | 0.0244 | 0.060416 115 | mbox_priorbox | 0.0159 | 0.077824 116 | mbox_conf_reshape | 0.0021 | 0.078848 117 | mbox_conf_softmax | 1.0677 | 0.304128 118 | mbox_conf_flatten | 0.0044 | 0.0768 119 | detection_out | 1.1011 | 3.1438

Depthwise convolutions in cuDNN are insanely slow. So bad that the CPU takes 0.9ms and the GPU takes 20ms in conv13/dw. I have seen it launching huge number of kernels. I suspect it launches one kernel per group which is inefficient.

DetectionOutput is slow because it's performed on CPU which requires the data to be transfered from the GPU to the CPU.

The priorbox, concat and permute operations are slower on the GPU because the operands for those operations are too small. Many of these operations use just a single digit or two digit number of cores even though my GPU has 768 cores.

/cc @catree might be relevant as you were interested to know why MobileNet performs so badly

pfeatherstone commented 4 years ago

i'm getting the following build errors:

/home/peter/Downloads/opencv/modules/dnn/src/cuda/math.hpp(23): error: identifier "hexp" is undefined

/home/peter/Downloads/opencv/modules/dnn/src/cuda/math.hpp(24): error: identifier "h2exp" is undefined

/home/peter/Downloads/opencv/modules/dnn/src/cuda/math.hpp(29): error: identifier "hexp" is undefined

/home/peter/Downloads/opencv/modules/dnn/src/cuda/math.hpp(30): error: identifier "h2exp" is undefined

/home/peter/Downloads/opencv/modules/dnn/src/cuda/math.hpp(53): error: identifier "hlog" is undefined

/home/peter/Downloads/opencv/modules/dnn/src/cuda/math.hpp(54): error: identifier "h2log" is undefined

/home/peter/Downloads/opencv/modules/dnn/src/cuda/math.hpp(59): error: more than one conversion function from "half" to a built-in type applies: function "half::operator float() const" function "half::operator short() const" function "half::operator unsigned short() const" function "half::operator int() const" function "half::operator unsigned int() const" function "half::operator long long() const" function "half::operator unsigned long long() const" function "half::operator nv_bool() const"

YashasSamaga commented 4 years ago

@pfeatherstone Can you upload your CMakeCache.txt? What device do you have?

pfeatherstone commented 4 years ago

@YashasSamaga I'm running Ubuntu18, cuda 10.1, cudnn 7 and i have two titan X. CMakeCache.txt

YashasSamaga commented 4 years ago

The CUDA backend provides a half-precision (DNN_TARGET_CUDA_FP16) target to further acclerate DNN inference. It makes use of half-precision intrinsics which are supported in devices with CC 5.3 and above only.

@pfeatherstone Your GPU's compute capability is 6.1 which is good enough. (Source: https://developer.nvidia.com/cuda-gpus#compute)

In your CMakeCache.txt, the CUDA_ARCH_BIN option appears to have been set to 3.0 3.5 3.7 5.0 5.2 6.0 6.1 7.0 7.5 (which I think is the default). Please change this option to 6.1 or a list of your choice where all architectures are CC 5.3+.

After building you can try running the tests:


[TODO DONE] throw an error while configuring CMake for unsupported compute capabilities

This CC limitation is only due to the half-precision support. It's possible to have a build option to enable or disable half-precision support in the CUDA backend. But I wonder if this is of any use. I don't think people still use very old GPUs and hence such an option might not be very useful.

Avrohom commented 4 years ago

Hi @YashasSamaga ,

Thanks for your reply earlier on. I have changed CUDA_ARCH_BIN as per your suggestion. I am trying to compile on windows 10. Still getting errors.

C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\gapi\src\compiler\gcompiler.cpp(115,1) : error C2512 : 'cv::gapi::GNetPackage': no appropriate default constructor available [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda 4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\gapi\src\compiler/passes/passes.hpp(31,12) : message : see declaration of 'cv::gapi::GNetPackage' (compiling source file C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\modules\gapi\src\compiler\gcompiler.cpp) [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\mod ules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(280,1) : error C2589: '(': illegal t oken on right side of '::' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world .vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(281,1) : error C2062: type 'unknown- type' unexpected [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(281,1): error C2059 : syntax error : ')' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(590,38) : error C2589: '(': illegal token on right side of '::' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_worl d.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(590,1) : error C2062: type 'unknown- type' unexpected [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(590,1): error C2059 : syntax error : ')' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(592,52): error C2059 : syntax error : ';' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(595,17) : error C2065: 'singleMean': undeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcx proj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(597,61) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(597,1) : error C2109: subscript requ ires array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world .vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(597,79) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(606,52) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(606,1) : error C2109: subscript requ ires array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world .vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(606,70) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(634,38) : error C2589: '(': illegal token on right side of '::' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_worl d.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(634,1) : error C2062: type 'unknown- type' unexpected [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(634,1): error C2059 : syntax error : ')' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(636,52): error C2059 : syntax error : ';' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(641,21) : error C2065: 'singleMean': undeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcx proj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(643,51) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(643,1) : error C2109: subscript requ ires array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world .vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(658,55) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(658,1) : error C2109: subscript requ ires array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world .vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(666,21) : error C2065: 'singleMean': undeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcx proj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(668,65) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(668,1) : error C2109: subscript requ ires array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world .vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(668,83) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(682,56) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(682,1) : error C2109: subscript requ ires array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world .vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(682,74) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(687,9): error C2059 : syntax error : 'return' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(772,37): error C4430 : missing type specifier - int assumed. message : C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(772,37): error C2143 : syntax error : missing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(781,42) : error C2065: 'LayerPin': u ndeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpr oj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(781,35) : error C2923: 'std::vector' : 'LayerPin' is not a valid template type argument for parameter '_Ty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4 dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\vector(352,37) : error C3 203: 'allocator': unspecialized class template can't be used as a template argument for template parameter '_Alloc', ex pected a real type (compiling source file C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dn n.cpp) [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(791,37): error C4430 : missing type specifier - int assumed. message : C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(791,37): error C2143 : syntax error : missing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(803,30): error C4430 : missing type specifier - int assumed. message : C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(803,30): error C2143 : syntax error : missing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(823,41): error C4430 : missing type specifier - int assumed. message : C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(823,41): error C2143 : syntax error : missing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(834,46) : error C2065: 'LayerPin': u ndeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpr oj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(834,39) : error C2923: 'std::vector' : 'LayerPin' is not a valid template type argument for parameter '_Ty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4 dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(842,38): error C4430 : missing type specifier - int assumed. message : C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(842,38): error C2143 : syntax error : missing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(888,42): error C2061 : syntax error : identifier 'LayerData' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(979,32): error C4430 : missing type specifier - int assumed. message : C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(979,32): error C2143 : syntax error : missing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(986,14) : error C2065: 'LayerPin': u ndeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpr oj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(986,10) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dn n-csl-low\build\modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\map(70,46) : error C3203: 'less': unspecialized class template can't be used as a template argument for template parameter '_Pr', expected a rea l type (compiling source file C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp) [C:\U sers\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\map(70,83) : error C3203: 'pair': unspecialized class template can't be used as a template argument for template parameter '_Ty', expected a rea l type (compiling source file C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp) [C:\U sers\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(989,14) : error C2065: 'LayerPin': u ndeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpr oj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(989,24) : error C2065: 'LayerPin': u ndeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpr oj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(989,10) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dn n-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(989,10) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Ty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(990,14) : error C2065: 'LayerPin': u ndeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpr oj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(990,10) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dn n-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(774,14) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dn n-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(772) : message : see declaration of 'LayerPin' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(774,34) : error C2955: 'std::map': u se of class template requires template argument list [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\ modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\map(71) : message : see d eclaration of 'std::map' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(774,64) : error C2065: 'lp': undecla red identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(775,29) : error C2663: 'std::_Tree<_ Traits>::end': 2 overloads have no legal conversion for 'this' pointer [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4 dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(776,24) : error C2065: 'lp': undecla red identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(783,1) : error C2662: 'allocator_tra its::rebind_alloc<_Ty>>::size_type std::vector<_Ty,_Alloc>::size(void) noexcept const': cannot convert 'this' pointer from 'const std::vector' to 'const std::vector<_Ty,_Alloc> &' [C:\Users\avrsi\source\repos\Open Cv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(783,1) : message : Reason: cannot co nvert from 'const std::vector' to 'const std::vector<_Ty,_Alloc>' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-c sl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(783,29) : message : Conversion requi res a second user-defined-conversion operator or constructor [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-lo w\build\modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\vector(1436,26) : message : see declaration of 'std::vector<_Ty,_Alloc>::size' (compiling source file C:\Users\avrsi\source\repos\OpenCv\opencv- cuda4dnn-csl-low\modules\dnn\src\dnn.cpp) [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\wor ld\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(785,1) : error C2678: binary '[': no operator found which takes a left-hand operand of type 'const std::vector' (or there is no acceptable conversion) [C:\ Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\vector(1461,27) : message : could be 'const _Ty &std::vector<_Ty,_Alloc>::operator [](const allocator_traits::rebind_al loc<_Ty>>::size_type) noexcept const' (compiling source file C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low \modules\dnn\src\dnn.cpp) [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world. vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\vector(1451,21) : message : or '_Ty &std::vector<_Ty,_Alloc>::operator [](const allocator_traits::rebind_alloc<_T y>>::size_type) noexcept' (compiling source file C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn \src\dnn.cpp) [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(785,1) : message : while trying to m atch the argument list '(const std::vector, int)' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\mod ules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(793,14) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dn n-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(791) : message : see declaration of 'LayerPin' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(793,14) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Ty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(791) : message : see declaration of 'LayerPin' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(793,39) : error C2955: 'std::map': u se of class template requires template argument list [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\ modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\map(71) : message : see d eclaration of 'std::map' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(793,70) : error C2065: 'lp': undecla red identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(794,9) : error C2663: 'std::_Tree<_T raits>::end': 2 overloads have no legal conversion for 'this' pointer [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4d nn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(795,18): error C2146 : syntax error : missing ';' before identifier 'memHost' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\wor ld\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(795,18) : error C2065: 'memHost': un declared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(797,14) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dn n-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(791) : message : see declaration of 'LayerPin' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(797,34) : error C2955: 'std::map': u se of class template requires template argument list [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\ modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\map(71) : message : see d eclaration of 'std::map' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(797,67) : error C2065: 'memHost': un declared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(798,9) : error C2663: 'std::_Tree<_T raits>::end': 2 overloads have no legal conversion for 'this' pointer [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4d nn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(805,9) : error C2065: 'user': undecl ared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(805,9) : error C2663: 'std::_Tree<_T raits>::end': 2 overloads have no legal conversion for 'this' pointer [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4d nn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(806,9) : error C2065: 'host': undecl ared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(806,9) : error C2663: 'std::_Tree<_T raits>::end': 2 overloads have no legal conversion for 'this' pointer [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4d nn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(807,18): error C2146 : syntax error : missing ';' before identifier 'memHost' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\wor ld\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(807,18) : error C2065: 'memHost': un declared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(807,37) : error C2065: 'host': undec lared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(808,18) : error C2065: 'user': undec lared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(808,26) : error C2065: 'memHost': un declared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(809,29) : error C2065: 'memHost': un declared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(809,51) : error C2663: 'std::_Tree<_ Traits>::end': 2 overloads have no legal conversion for 'this' pointer [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4 dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(811,18) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dn n-csl-low\build\modules\world\opencv_world.vcxproj]
YashasSamaga commented 4 years ago

@Avrohom Can you try building once again from a fresh clone? You seem to have errors from many modules.

I am able to build on my PC with VS17. What compiler are you using?

Avrohom commented 4 years ago

Hi @YashasSamaga,

Many thanks for your kind help. I will do a fresh clone. I was using VS19. Used it to successfuly build the official OpenCv Master.

Avrohom commented 4 years ago

@YashasSamaga !

Nope. Not working. Did a fresh clone. Tried with VS2017.

C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(280): error C2589: '(': illegal toke n on right side of '::' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(281): error C2062: type 'unknown-typ e' unexpected [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(281): error C2059: syntax error: ')' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(590): error C2589: '(': illegal toke n on right side of '::' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(590): error C2062: type 'unknown-typ e' unexpected [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(590): error C2059: syntax error: ')' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(592): error C2059: syntax error: ';' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(595): error C2065: 'singleMean': und eclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj ] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(597): error C2065: 'scale': undeclar ed identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(597): error C2109: subscript require s array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(606): error C2065: 'scale': undeclar ed identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(606): error C2109: subscript require s array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(634): error C2589: '(': illegal toke n on right side of '::' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(634): error C2062: type 'unknown-typ e' unexpected [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(634): error C2059: syntax error: ')' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(636): error C2059: syntax error: ';' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(641): error C2065: 'singleMean': und eclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj ] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(643): error C2065: 'scale': undeclar ed identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(643): error C2109: subscript require s array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(658): error C2065: 'scale': undeclar ed identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(658): error C2109: subscript require s array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(666): error C2065: 'singleMean': und eclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj ] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(668): error C2065: 'scale': undeclar ed identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(668): error C2109: subscript require s array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(682): error C2065: 'scale': undeclar ed identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(682): error C2109: subscript require s array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(687): error C2059: syntax error: 're turn' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(772): error C4430: missing type spec ifier - int assumed. Note: C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low \build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(772): error C2143: syntax error: mis sing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(781): error C2065: 'LayerPin': undec lared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(781): error C2923: 'std::vector': 'L ayerPin' is not a valid template type argument for parameter '_Ty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn- csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(781): error C3203: 'allocator': unsp ecialized class template can't be used as a template argument for template parameter '_Alloc', expected a real type [C: \Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(791): error C4430: missing type spec ifier - int assumed. Note: C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low \build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(791): error C2143: syntax error: mis sing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(803): error C4430: missing type spec ifier - int assumed. Note: C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low \build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(803): error C2143: syntax error: mis sing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(823): error C4430: missing type spec ifier - int assumed. Note: C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low \build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(823): error C2143: syntax error: mis sing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(834): error C2065: 'LayerPin': undec lared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(834): error C2923: 'std::vector': 'L ayerPin' is not a valid template type argument for parameter '_Ty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn- csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(834): error C3203: 'allocator': unsp ecialized class template can't be used as a template argument for template parameter '_Alloc', expected a real type [C: \Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(842): error C4430: missing type spec ifier - int assumed. Note: C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low \build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(842): error C2143: syntax error: mis sing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(888): error C2061: syntax error: ide ntifier 'LayerData' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(979): error C4430: missing type spec ifier - int assumed. Note: C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low \build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(979): error C2143: syntax error: mis sing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(986): error C2065: 'LayerPin': undec lared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(986): error C2923: 'std::map': 'Laye rPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-cs l-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(986): error C3203: 'less': unspecial ized class template can't be used as a template argument for template parameter '_Pr', expected a real type [C:\Users\a vrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\include\map(79): error C3203: 'pair': unspecialized class template can't be used as a template argument for template parameter '_Ty', expected a real type (compiling source file C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp) [C:\Us ers\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(989): error C2065: 'LayerPin': undec lared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(989): error C2923: 'std::map': 'Laye rPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-cs l-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(989): error C2923: 'std::map': 'Laye rPin' is not a valid template type argument for parameter '_Ty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl -low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(989): error C3203: 'less': unspecial ized class template can't be used as a template argument for template parameter '_Pr', expected a real type [C:\Users\a vrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(990): error C2065: 'LayerPin': undec lared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(990): error C2923: 'std::map': 'Laye rPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-cs l-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(990): error C3203: 'less': unspecial ized class template can't be used as a template argument for template parameter '_Pr', expected a real type [C:\Users\a vrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(774): error C2923: 'std::map': 'Laye rPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-cs l-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(772): note: see declaration of 'La yerPin' C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(774): error C3203: 'less': unspecial ized class template can't be used as a template argument for template parameter '_Pr', expected a real type [C:\Users\a vrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(774): error C2955: 'std::map': use o f class template requires template argument list [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modu les\world\opencv_world.vcxproj]
alalek commented 4 years ago

@Avrohom

opencv_world.vcxproj

Why are you building in opencv_world mode? CUDA is not well supported in this mode.

Avrohom commented 4 years ago

@alalek. Well, I managed to get the official Master branch built in opencv_world mode. It included CUDA.

alalek commented 4 years ago

Looks like it blames on std::max - common windows problem: https://stackoverflow.com/questions/13416418/define-nominmax-using-stdmin-max/13420838 Try to add #define NOMINMAX somewhere (precomp.hpp or CMake via add_definitions(/DNOMINMAX))

Avrohom commented 4 years ago

Hi @alalek, @YashasSamaga ,

Many thanks,

Yes, I had to repair that std::max issue. Didn't knew where to #define NOMINMAX though, (haven't noticed your reply than), so, I basically changed it to (std::max)(...).

Also, did manually change the code in modules/gapi/include/opencv2/gapi/infer.hpp to apply the fix as described at https://github.com/opencv/opencv/commit/212f0fb5093ff8353cce602084d60225061f79f1#diff-e56cd60011ddca49858f11ce28fd3c31.

I am working with the 'opencv-cuda4dnn-csl-low' branch. Because it looked to me to be the most up to date. Am I missing something? To which branch has the above fix been merged to?

I can confirm now that I have been able to build that repository successfully on windows including the opencv_world option.

Interesting though, that when testing that build that has been compiled in debug mode for Mask R CNN inference, it takes approx 1,150ms per frame to process and I do get a few messages about the library 'falling back' onto CPU for various operations, whereas when running the Release version, I do not get any of those 'fallback' messages and processing takes +-350ms / frame. Such a big difference is quite strange.

YashasSamaga commented 4 years ago

@Avrohom

I have run into the std::max/std::min issue after rebasing. I think a fix for MSVC should be added in cmake globally.

The debug build prevents many compiler optimizations which would improve the inference time. IMO, such a large difference in timings is normal.

The fallback messages are not displayed in release builds but the fallbacks are used. Refer to this comment to enable the messages in non-debug builds.

davisking commented 4 years ago

What are the next steps on this PR? I see we are waiting for reviews from @alalek and @dkurt. Is there anything holding up those reviews yet to be completed by @YashasSamaga?