YashasSamaga commented 4 years ago

There are many ways to make use of multiple GPUs. Here is one which I think is the safest and the least complex solution. It makes use of the fact that the CUDA runtime library maintains a separate CUDA context for each CPU thread.

Suppose you have N devices.
Create N threads.
Assign a CUDA device to each thread by calling cudaSetDevice or cv::cuda::setDevice in that thread. Each thread is now associated with a device.
You can create any number of cv::dnn::Net objects in any of those threads and the network will use the device associated with that thread for memory and computation.

Demo Video: https://www.youtube.com/watch?v=ljCfluWYymM

Project summary/benchmarks: https://gist.github.com/YashasSamaga/a84cf2826ab2dc755005321fe17cd15d

Support Matrix for this PR

## Current Support Matrix: (not updated) Blip | Meaning ---- | --------- ✔️ | supports all the configurations that are supported by all the existing backends (and might support more than what's currently supported) 🔵 | partially supported (fallback to CPU for unsupported configurations) :x: | not supported (fallback to CPU) Layer | Status | Constraints | Notes ---------------------------------------- | ------ | ------------- | -------------- Activations | ✔️ Batch Normalization | ✔️ Blank Layer | ✔️ Concat Layer | ✔️ Const Layer | ✔️ Convolution 2d | ✔️ | | asymmetric padding is disabled in layer constructor but the backend supports it Convolution 3d | ✔️ | | asymmetric padding is disabled in layer constructor but the backend supports it Crop and resize | :x: | Crop Layer | ✔️ | | forwarded to Slice Layer Detection Output Layer | :x: | Deconvolution 2d | 🔵 | padding configuration should not lead to extra uneven padding Deconvolution 3d | 🔵 | padding configuration should not lead to extra uneven padding Elementwise Layers | ✔️ | Eltwise Layer | ✔️ | Flatten Layer | ✔️ | Fully Connected Layer | ✔️ | Input Layer | :x: | Interp Layer | ✔️ | Local Response Normalization | ✔️ | Max Unpooling 2d | ✔️ | Max Unpooling 3d | ✔️ | MVN Layer | :x: | Normalize Layer | 🔵 | Only L1 and L2 norm supported Padding Layer | ✔️ Permute Layer | ✔️ Pooling 2d | 🔵 | Only max and average pooling supported | supports asymmetric padding Pooling 3d | 🔵 | Only max and average pooling supported | supports asymmetric padding Prior Box Layer | ✔️ Proposal Layer | :x: Region Layer | ✔️ | NMS performed using CPU Reorg Layer | ✔️ | Reshape Layer | ✔️ | Resize Layer | ✔️ Scale Layer | ✔️ Shift Layer | ✔️ | | forwarded to Scale Layer Shuffle Channel Layer | ✔️ Slice Layer | ✔️ Softmax Layer | ✔️ Split Layer | ✔️ LSTM Layer | :x:

Known issues:

Tests for some of the SSD based networks fail on Jetson Nano

References: #14585

Results:

force_builders_only=Custom,linux,docs
buildworker:Custom=linux-4
docker_image:Custom=ubuntu-cuda:18.04

YashasSamaga commented 4 years ago

Do I have to use CV_OVERRIDE and CV_FINAL? I preassume that they were added for portability but now since both final and override are keywords in C++11, should they be used?

Can I use std::shared_ptr instead of cv::Ptr? There isn't a make_shared equivalent and makePtr doesn't do what std::make_shared does.

Is it fine to force push occasionally when there isn't any dependent stuff like reviews in between?

alalek commented 4 years ago

CV_OVERRIDE and CV_FINAL

It is used to avoid excessive merge issues from 3.4 branch. As your code is in master branch only and this problem is not actual, so you can use C++ keywords/modifiers.

use std::shared_ptr instead of cv::Ptr

Feel free to use std::shared_ptr (but it is not supported by bindings generator, so be careful with public API).

makePtr doesn't do what std::make_shared does.

In master branch it is just a wrapper, so it should do the same things.

Is it fine to force push

It is OK. Also rebasing is preferred over "merge" commits (it is easy to do that using 1 squashed commit: squash first, then rebase).

davisking commented 4 years ago

Seems like it would be implementation defined at worst, rather than UB. You sure it’s UB? If it’s ok in c++17 and works in our case I think it’s fine. I would be surprised if some compilers defined std::iterator_traits<T>::iterator_category for non iterators in c++11.

applied-machinelearning commented 4 years ago

First the good (or great): I have done some tests and I can build opencv and successfully run both test and perf on my workstation with an "pascal" generation card (compute capability 6.1) and cuda 10.1 with gcc-8 as compiler. It does generate quite some warnings like:

In file included from /usr/local/cuda/include/cuda_fp16.h:2524:0, from /mnt/storage/src/opencv/opencv-cudadnn-buildtest/modules/dnn/src/cuda/activations.cu:6: /usr/local/cuda/include/cuda_fp16.hpp:279:6: warning: "__CUDA_ARCH__" is not defined, evaluates to 0 [-Wundef]

if (CUDA_ARCH >= 530 || !defined(CUDA_ARCH)) && !defined(__CUDA_NO_HALF2_OPERATORS__)

^~~~~

The included "cuda_fp16.h" and "cuda_fp16.hpp" have a lot of __CUDA_ARCH__ ifdef'ery around fp16 datatypes support for different compute capabilities. So somehow nvcc doesn't seem to have that macro defined while compiling the .cu files. However if i read the manual it states:

5.7.4. Virtual Architecture Identification Macro The architecture identification macro __CUDA_ARCH__ is assigned a three-digit value string xy0 (ending in a literal 0) during each nvcc compilation stage 1 that compiles for compute_xy. This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.

It looks likes it should have been defined, as this is Device and not host code ? However it doesn't seem to impact the build for this card generation / computer capability.

And now the somewhat less good: Trying to build on a nvidia jetson nano (compute capability 5.3) fails, nvidia currently only provides an image with cuda 10.0 cudnn 7.3.1 and gcc-7 as compiler. So i'm stuck on that for the moment.

I have attached the CMakeVars.txt and the complete build logs for both machines.

pascal: opencv-cudadnn-buildtest-pascal-gcc.txt CMakeVars.txt nano: opencv-cudadnn-buildtest-jetsonnano-gcc.txt CMakeVars.txt

YashasSamaga commented 4 years ago

@applied-machinelearning That's surprising. I have two tests failing on my PC (and I expect them to fail on any PC).

The warnings are emitted by the CUDA headers which are broken (I think so). That's what I can infer from this NVIDIA DevTalk post.

The PR uses tensor transform API for adding asymmetric padding (required for same padding mode) in the convolution layer. This API was added in cuDNN starting from version 7.5.0 (Release Notes). Hence, the build is failing on nano.

The minimum version of cuDNN required is 7.5.0. The CMake doesn't reflect this yet [TODO].

@davisking what should be the minimum version of cuDNN supported? The latest version is 7.6.2 and the current minimum is 7.5.0.

@alalek Is it possible to run the accuracy and performance tests on CI? The cuda build doesn't seem to run the tests.

There are two warnings from an unrelated module:

/build/precommit_custom_linux/opencv_contrib/modules/cudalegacy/src/cuda/NCVBroxOpticalFlow.cu(1124): warning: variable "p_threads" was declared but never referenced

/build/precommit_custom_linux/opencv_contrib/modules/cudalegacy/src/cuda/NCVBroxOpticalFlow.cu(958): warning: variable "dThreads" was declared but never referenced

applied-machinelearning commented 4 years ago

@applied-machinelearning That's surprising. I have two tests failing on my PC (and I expect them to fail on any PC).

Sorry for not being precise, there are some individual tests failing, but the test and perf scripts finish without segmentation faults and other grave errors.

The minimum version of cuDNN required is 7.5.0. The CMake doesn't reflect this yet [TODO].

Thank you for clearing that up, although it's a pity since at the moment that would rule out the Jetson embedded boards for which Nvidia also doesn't support openCL. Unfortunately Nvidia doesn't provide individual downloads/releases for aarch64 for it's libraries. Hopefully they will release new images with cuda 10.1 and an updated cuDNN anytime soon.

alalek commented 4 years ago

@YashasSamaga Great progress!

Is it possible to run the accuracy and performance tests on CI

No, we don't have CUDA GPUs in CI. One of the problems is prohibition of NVIDIA drivers installation for normal GPUs:

two warnings from an unrelated module

just ignore them (or feel free to prepare separate PR which suppresses/eliminates these warnings, like #15267)

davisking commented 4 years ago

I think requiring 7.5.0 is fine.

YashasSamaga commented 4 years ago

NOTE: the devices used in the test are low-end mobile devices

CPU: i7 7700HQ GPU: NVIDIA GTX 1050 Mobile

BLAS Library: MKL 2019.0.4 CUDA Version: 10.1 cuDNN: 7.6.2

Warmup Runs: 3 (forward pass is performed three times before benchmarks) Benchmark Runs: 10 (the average of ten forward passes is reported)

Test Code: https://gist.github.com/YashasSamaga/71157cf0c3768c497e5e70fb95435596

Notes:

the CPU (no apps; only terminal) and GPU (used for display) were being used by the OS/applications during benchmark
no benchmarks have been carried out for CUDA FP16 (as my GPU is insanely slow at FP16 computations); a more recent GPU is required for the 2x FP16 throughput

Backend Comparision

Batch Size = 1

Model	CUDA FP32	Inference Engine CPU	OpenCV CPU
GoogLeNet	7.2447ms	10.4981ms	17.9176ms
DenseNet121	12.6324ms	19.1823ms	48.0628ms
EAST Text Detection	18.8281ms	49.0508ms	88.9429ms
ENet	11.5014ms	Exception	62.5854ms
FastNeuralStyle StaryNight	27.498ms	178.309ms	160.359ms
Inception 5h	7.8546ms	22.2789ms	20.3255ms
Inception v2 FasterRCNN	112.736ms	Exception	374.26ms
MobileNet SSD	58.4751ms	9.2896ms	27.3061ms
OpenCV Face Detector	6.9831ms	8.3981ms	17.6683ms
OpenPose Pose MPI	160.561ms	509.446ms	838.161ms
Resnet 50	11.3603ms	28.1529ms	50.2752ms
SqueezeNet	2.4084ms	3.2918ms	5.476ms
VGG16 SSD	70.4117ms	249.725ms	360.207ms
Yolo v3	57.9822ms	214.629ms	296.806ms
Yolo v2	51.5784ms	193.453ms	260.19ms

Batch Size = 10

Model	CUDA FP32	Inference Engine CPU	OpenCV CPU
GoogLeNet	35.7556ms	108.946ms	225.928ms
DenseNet121	74.9241ms	295.105ms	650.924ms
EAST Text Detection	149.58ms	536.946ms	1273.93ms
FastNeuralStyle StaryNight	283.173ms	1966.5ms	2175.3ms
Inception 5h	36.6225ms	180.429ms	233.276ms
MobileNet SSD	277.753ms	111.872ms	316.063ms
OpenCV Face Detector	52.4366ms	95.7866ms	202.657ms
OpenPose Pose MPI	628.617ms	5650.05ms	10683.5ms
Resnet 50	74.283ms	230.817ms	541.308ms
SqueezeNet	15.8144ms	35.4915ms	69.4122ms
VGG16 SSD	594.286ms	2796.23ms	4661.51ms
Yolo v3	488.704ms	2419.8ms	4209.74ms
Yolo v2	491.414ms	2185.47ms	3788.34ms

OpenCV OpenCL vs CUDA FP32 (on NVIDIA GPU)

Batch Size = 1

Model	CUDA FP32	OpenCV OpenCL NVIDIA
GoogLeNet	7.5951ms	56.218ms
DenseNet121	12.9375ms	110.564ms
EAST Text Detection	19.1325ms	309.341ms
ENet	11.8922ms	38.8476ms
FastNeuralStyle StaryNight	29.69ms	346.566ms
Inception 5h	8.8545ms	57.4015ms
Inception v2 FasterRCNN	114.535ms	2244.24ms
MobileNet SSD	57.6893ms	148.459ms
OpenCV Face Detector	6.9666ms	59.9923ms
OpenPose Pose MPI	162.01ms	2377.14ms
Resnet 50	11.9307ms	176.066ms
SqueezeNet	2.4413ms	14.6637ms
VGG16 SSD	70.8822ms	1288.96ms
Yolo v3	58.133ms	1168.71ms
Yolo v2	53.5697ms	1016.73ms

Integrated Graphics (IG) vs NVIDIA (using OpenCV backend's OpenCL target)

Batch Size = 1

Model	OpenCV OpenCL IG	OpenCV OpenCL FP16 IG	OpenCV OpenCL NVIDIA
GoogLeNet	15.5681ms	11.7769ms	56.218ms
DenseNet121	49.4344ms	56.3869ms	110.564ms
EAST Text Detection	86.0381ms	80.949ms	309.341ms
ENet	27.4152ms	Exception	38.8476ms
FastNeuralStyle StaryNight	105.712ms	132.263ms	346.566ms
Inception 5h	17.4537ms	14.6988ms	57.4015ms
Inception v2 FasterRCNN	358.018ms	374.585ms	2244.24ms
MobileNet SSD	20.5701ms	21.3236ms	148.459ms
OpenCV Face Detector	21.3481ms	26.4779ms	59.9923ms
OpenPose Pose MPI	888.518ms	870.852ms	2377.14ms
Resnet 50	33.1333ms	25.5099ms	176.066ms
SqueezeNet	5.877ms	5.865ms	14.6637ms
VGG16 SSD	425.423ms	353.651ms	1288.96ms
Yolo v3	339.913ms	338.573ms	1168.71ms
Yolo v2	446.899ms	314.487ms	1016.73ms

applied-machinelearning commented 4 years ago

After I found an updated image for the jetson nano (cuda 10, cudnn 7.5.0), i have been able to compile opencv with cudadnn.

Unfortunately most tests fail with:

unknown file: Failure C++ exception with description "OpenCV(4.1.1-dev) /mnt/storage/src/opencv/opencv-cudadnn-buildtest/modules/dnn/src/cuda4dnn/csl/memory.hpp:263: error: (-217:Gpu API call) operation not supported in function 'MemoryLockGuard' " thrown in the test body.

I have seen this exception also on my x86 / GTX 1060 when trying to run some python scripts (some work, some fail with this error).

CMakeVars.txt buildlog.txt opencv-dnn-test-log.txt

BTW: Very impressive benchmark results you posted !

YashasSamaga commented 4 years ago

This has something to do with Jetson not supporting the ability to page-lock already allocated memory.

https://devtalk.nvidia.com/default/topic/1032259/jetson-tx2/cudaerrornotsupported-when-calling-cv-cuda-cudahostregister-on-nvidia-tx2/post/5256704/

MemoryLockGuard essentially page-locks host memory. This boosts the host to device memory transfer bandwidth (almost doubles on my PC) and also allows the transfer to happen asynchronously.

This also improves inference time when the network has layers which do not have CUDA implementations. It may be insignficant for large networks though.

But yes, it doesn't seem nice to rule out all the Jetson devices. I am currently thinking of adding a build option or maybe a runtime-option (disabled by default) to prevent page-locking.

An alternate solution is to allocate page-locked memory during allocation instead of having to later lock it. But this is mostly not possible as the host memory is allocated by the DNN backbone code which is independent of the CUDA backend.

Yes, there is a test failing on my x86 PC due to MemoryLockGuard for attempting to lock memory which is already locked. I will be rolling out a fix for this soon. I am speculating that this is the error you are also facing.

Do you know what situation causes the error? What's different in the codes that work and that don't?

applied-machinelearning commented 4 years ago

But yes, it doesn't seem nice to rule out all the Jetson devices. I am currently thinking of adding a build option or maybe a runtime-option (disabled by default) to prevent page-locking.

Could perhaps be auto enabled build option on Arm architecture as the referenced post seems to suggest it is a problem on Arm only ?

An alternate solution is to allocate page-locked memory during allocation instead of having to later lock it. But this is mostly not possible as the host memory is allocated by the DNN backbone code which is independent of the CUDA backend.

Yes, there is a test failing on my x86 PC due to MemoryLockGuard for attempting to lock memory which is already locked. I will be rolling out a fix for this soon. I am speculating that this is the error you are also facing.

Do you know what situation causes the error? What's different in the codes that work and that don't?

No unfortunately I haven't been able to figure it out. All the hunches I had, I incorporated in the simple test script, but that doesn't fail.

YashasSamaga commented 4 years ago

I won't be adding new code to the PR anymore. The commits from now on will be limited to bug fixes and refactoring. I will create separate PRs and improve upon this PR in the coming months.

@alalek @dkurt PR is ready for review.

applied-machinelearning commented 4 years ago

Got some test and perf results for the jetson nano. opencv-dnn-perf-log.txt opencv-dnn-test-log.txt

YashasSamaga commented 4 years ago

The Deconvolution3D test is failing because the DNN backbone code has attempted to allocate an internal blob with zero size which has been caught by an assertion in the CUDA backend. If I remember correctly, this is a recent issue which wasn't happening until I rebased a week ago.

https://github.com/opencv/opencv/blob/fa55dc8ded28e0261c3c3370170b0155e22c429b/modules/dnn/src/dnn.cpp#L1925

/cc @alalek I am not sure if this is a bug. Deconvolution3D appears to request a zero sized internal blob.

[TODO BUG] The CUDA backend has it's own system of managing internal blobs and hence I shouldn't even be allocating from these internal blobs. I'll skip the wrap() call for CUDA backend. This should free up some (maybe significant) precious GPU memory.

The SSD networks aren't failing on my PC. The outputs certainly look wrong though. Can you try running on the tests on your GTX 1060 and check if they fail? @applied-machinelearning

applied-machinelearning commented 4 years ago

The SSD networks aren't failing on my PC. The outputs certainly look wrong though. Can you try running on the tests on your GTX 1060 and check if they fail? @applied-machinelearning

You are correct, they aren't failing on my other machine. opencv-dnn-test-log-pascal.txt

Main differences are (jetson vs my workstation with pascal card): cuda: 10.0 vs 10.1 cudnn: 7.5.0 vs 7.6.2 compute capability: 5.3 vs 6.0 / 6.1 (Maxwell vs Pascal, note that normal maxwell is 5.0 / 5.2, the jetson nano's 5.3 seems to have different handling around the FP16 stuff) arch: aarch64 vs x86 (thinking out loud: issue with something like endianness in handing over to cpu layers?)

Another thing i noticed: While I tried to compile with compute capability 5.0 / 5.2 on my workstation (trying to rule out if the compute capability has anything to do with the above problem). But that fails on the __half stuff which seems to be only necessary for the CUDA_FP16 stuff. If I remember correctly CUDA_FP16 only makes sense performance wise for later cards, so perhaps the whole CUDA_FP16 should depend on compute capability being at least pascal ? That would probably make all these compile errors go away. See the build log: buildlog.txt

applied-machinelearning commented 4 years ago

Good news, commit e4e6759, seems to have fixed the last failing test: Test_ONNX_layers.Deconvolution3D/0, where GetParam() = CUDA/CUDA

I have done some more digging around with the python problem. After commit "ignore memory lock failures" it now fails on: cv2.error: OpenCV(4.1.1-dev) /mnt/storage/opencv/opencv-cudadnn/modules/dnn/src/cuda4dnn/csl/memory.hpp:54: error: (-217:Gpu API call) initialization error in function 'ManagedPtr'.

So I intstrumented that with cudaMemGetInfo and the requested malloc size. When using the simple python script it functions properly:

cudaMemGetInfo free: 6103695360 total: 6373179392 Malloc size: 2076672

But when trying the more involved script it fails:

cudaMemGetInfo free: 0 total: 0 Malloc size: 2076672

So it seems cuda isn't initialized properly. That is probably due to the more involved script being multiprocess / multithreaded. I will see if I can make a minimal python script that exhibits the problem.

rcaltin commented 4 years ago

Hello, thank you for your effords.

I got successful tests on platforms above with my 300x300px trained Inception v2 SSD tensorflow model:

Test Platform # 1 : GTX-1050-TI & i7-7700HQ & Win10 : ~22 fps on CPU (DNN_BACKEND_OPENCV & DNN_TARGET_CPU), ~49 fps on CPU with IE (DNN_BACKEND_INFERENCE_ENGINE & DNN_TARGET_CPU), ~46 fps on CUDA GPU (DNN_BACKEND_CUDA & DNN_TARGET_CUDA)

Test Platform # 2 : GTX-1050-Mobile & i5-8300H & Win10 : ~9 fps on CPU (DNN_BACKEND_OPENCV & DNN_TARGET_CPU), ~21 fps on CPU with IE (DNN_BACKEND_INFERENCE_ENGINE & DNN_TARGET_CPU), ~41 fps on CUDA GPU (DNN_BACKEND_CUDA & DNN_TARGET_CUDA)

catree commented 4 years ago

@YashasSamaga Awesome work!

Any idea why CUDA MobileNet SSD performs badly? Probably some missing layers in the CUDA backend?

What could be the reasons of the bad performance of OpenCL backend on Nvidia? It performs worse than OpenCL on Intel IG.

Is it because the OpenCL drivers on Nvidia are suboptimal? Is it because the kernels are tuned for Intel IG? A combination of both maybe?

YashasSamaga commented 4 years ago

Any idea why CUDA MobileNet SSD performs badly? Probably some missing layers in the CUDA backend?

cuDNN performs very poorly for depthwise convolutions. It launches thousands of kernels. Hopefully, this will be fixed in a future version of cuDNN.

The only missing layer is DetectionOutputLayer which doesn't take a toll on the performance as it appears at the end of the network. In fact, it's generally faster on the CPU than on a CUDA device (it's also partly because of my inability to write a kernel which can outperform the CPU).

What could be the reasons of the bad performance of OpenCL backend on Nvidia? It performs worse than OpenCL on Intel IG. Is it because the OpenCL drivers on Nvidia are suboptimal? Is it because the kernels are tuned for Intel IG? A combination of both maybe?

The OpenCV backend's OCL implementation frequently uses the CPU target as a fallback. This target switch is very cheap as the IG and CPU share the same memory.

NVIDIA devices have their own dedicated graphics memory. Every time a fallback is used, you'll have to transfer the memory from the device to the host. This is very costly. So costly that any benefits that are to be gained are completely outweighed by the cost of the intermediate memory transfers.

OpenCL limits the ability to exploit the full capability of CUDA devices. It offers far less control than what a CUDA backend could have.

YashasSamaga commented 4 years ago

GTX 1080 Ti Benchmark

CPU: 2x Intel Xeon E5-2640 v4 (40 logical cores) GPU: 1x NVIDIA GTX 1080 Ti (11 GB)

CUDA Version: 10.0 cuDNN: 7.6.2

Warmup Runs: 3 (forward pass is performed three times before benchmarks) Benchmark Runs: 10 (the average of ten forward passes is reported)

Test Code: https://gist.github.com/YashasSamaga/71157cf0c3768c497e5e70fb95435596

Backend Comparision

Batch Size = 1

Model	CUDA FP32	OpenCV CPU
GoogLeNet	4.8824ms	14.2981ms
DenseNet121	6.4555ms	57.8244ms
EAST Text Detection	5.901ms	67.4301ms
ENet	4.5979ms	30.2767ms
FastNeuralStyle StaryNight	5.3193ms	51.3313ms
Inception 5h	4.9487ms	16.0048ms
Inception v2 FasterRCNN	82.0298ms	179.245ms
MobileNet SSD	70.9177ms	23.9348ms
OpenCV Face Detector	4.9288ms	15.4205ms
OpenPose Pose MPI	30.5954ms	246.747ms
Resnet 50	4.5968ms	45.1153ms
SqueezeNet	1.0888ms	3.6492ms
VGG16 SSD	23.5926ms	194.976ms
Yolo v3	18.0002ms	141.861ms
Yolo v2	12.1279ms	111.642ms

Batch Size = 10

Model	CUDA FP32	OpenCV CPU
GoogLeNet	10.149ms	75.9591ms
DenseNet121	20.269ms	312.426ms
EAST Text Detection	32.1556ms	402.16ms
FastNeuralStyle StaryNight	49.1025ms	461.095ms
Inception 5h	9.9721ms	67.9308ms
MobileNet SSD	96.2898ms	110.783ms
OpenCV Face Detector	22.7501ms	77.8742ms
OpenPose Pose MPI	118.858ms	2321.89ms
Resnet 50	18.4139ms	229.599ms
SqueezeNet	4.4893ms	22.3049ms
VGG16 SSD	194.181ms	1319.67ms
Yolo v3	122.603ms	1044.11ms
Yolo v2	104.072ms	819.177ms

Batch Size = 128

Model	CUDA FP32	OpenCV CPU
GoogLeNet	90.3755ms	775.769ms
DenseNet121	199.516ms	3536.38ms
EAST Text Detection	376.458ms	7685.72ms
FastNeuralStyle StaryNight	801.778ms	6607.15ms
Inception 5h	93.4188ms	771.575ms
MobileNet SSD	1028.93ms	1110.37ms
OpenCV Face Detector	276.992ms	977.997ms
OpenPose Pose MPI	1279.26ms	32159.3ms
Resnet 50	200.789ms	1719.92ms
SqueezeNet	55.6244ms	255.397ms
VGG16 SSD	2969.05ms	17201ms
Yolo v3	1564.78ms	13699.2ms
Yolo v2	1362.84ms	11254.9ms

Images processed per second (CUDA FP32)

Model	batch size = 1	batch size = 10	batch size = 128
GoogLeNet	204	985	1416
DenseNet121	154	493	641
EAST Text Detection	169	311	340
ENet	217	Not Applicable	Not Applicable
FastNeuralStyle StaryNight	188	204	160
Inception 5h	202	1002	1370
Inception v2 FasterRCNN	12	Not Aplicable	Not Applicable
MobileNet SSD	14	104	124
OpenCV Face Detector	202	440	462
OpenPose Pose MPI	33	84	100
Resnet 50	217	540	637
SqueezeNet	918	2228	2301
VGG16 SSD	42	52	43
Yolo v3	55	82	81
Yolo v2	82	96	93

OpenCV CUDA vs TensorFlow

Batch Size 1

Model	CUDA FP32	TensorFlow
ResNet-50	4.5968ms	7.1163ms
EAST Text Detection	5.901ms	8.6890ms

Batch Size 10

Model	CUDA FP32	TensorFlow
ResNet-50	18.4139ms	22.3665ms
EAST Text Detection	32.1556ms	39.4857ms

Batch Size 128

Model	CUDA FP32	TensorFlow
ResNet-50	200.789ms	216.3923ms
EAST Text Detection	376.458ms	421.8292ms

applied-machinelearning commented 4 years ago

@YashasSamaga Here is a simple python script that shows initialization problems with CUDA and multiprocessing in python. I have no idea if it works with c++ etc.

It also causes an error when setting the backend to CPU in the other process.

https://gist.github.com/applied-machinelearning/9462e1368065fd7bf93334b0130a6ba0

Starting detection on CPU from mainprocess Results of detection on CPU from mainprocess [[0.01141005 0.00488871 0.02028161 ... 0. 0. 0. ] ... [0.98707837 0.98927605 0.13942897 ... 0. 0. 0. ]]

Starting detection on CUDA from mainprocess Results of detection on CUDA from mainprocess [[0.01141005 0.00488872 0.02028161 ... 0. 0. 0. ] ... [0.98707837 0.98927605 0.13942908 ... 0. 0. 0. ]]

Starting detection on CPU from multiprocessing terminate called after throwing an instance of 'cv::dnn::cuda4dnn::csl::cublas::cuBLASException' what(): OpenCV(4.1.1-dev) /mnt/storage/opencv/opencv-cudadnn-really-working/modules/dnn/src/cuda4dnn/csl/cublas.hpp:63: error: (-217:Gpu API call) CUBLAS_STATUS_NOT_INITIALIZED in function 'UniqueHandle'

Starting detection on CUDA from multiprocessing terminate called after throwing an instance of 'cv::dnn::cuda4dnn::csl::cublas::cuBLASException' what(): OpenCV(4.1.1-dev) /mnt/storage/opencv/opencv-cudadnn-really-working/modules/dnn/src/cuda4dnn/csl/cublas.hpp:63: error: (-217:Gpu API call) CUBLAS_STATUS_NOT_INITIALIZED in function 'UniqueHandle'

alalek commented 4 years ago

multiprocessing in python

Perhaps we need to block fork() calls from Python: #5150 multiprocessing.set_start_method('spawn') should help.

applied-machinelearning commented 4 years ago

@alalek Just tested and that works for me !

Any ideas on: https://github.com/opencv/opencv/pull/14827#issuecomment-522737374 ?

YashasSamaga commented 4 years ago

@applied-machinelearning I have tested on several mobile and desktop GPUs but have not been able to reproduce the failures. It might be specific to Jetson but I don't have access to one (at least for now).

Can you try recloning (or maybe a hard reset?) repositories (opencv_extra and my fork) and run the tests again?

EDIT: I have verified with someone else. The tests are failing on Jetson Nano (with CUDA 10.0 and cuDNN 7.5.0).

The issue is not with CUDA 10.0 or CUDA 10.1 as I have tested both on desktop GPUs. I haven't used cuDNN other than 7.6.2. So it's either cuDNN or something wrong in my code which breaks only in Jetson Nano.

spacemud commented 4 years ago

I also did some testing with the Jetson Nano; here are my results.

dnn_tests.txt dnn_perf.txt

It appears that the SSD networks are failing in the tests.

applied-machinelearning commented 4 years ago

@YashasSamaga I retested with pristine cloned trees and downloads and still got these failing tests.

[ FAILED ] DNNTestNetwork.MobileNet_SSD_v2_TensorFlow/0, where GetParam() = CUDA/CUDA [ FAILED ] DNNTestNetwork.SSD_VGG16/0, where GetParam() = CUDA/CUDA [ FAILED ] DNNTestNetwork.Inception_v2_SSD_TensorFlow/0, where GetParam() = CUDA/CUDA [ FAILED ] Test_TensorFlow_nets.Inception_v2_SSD/0, where GetParam() = CUDA/CUDA

I previously tried to compile for maxwell generation on my workstation (with pascal card) but that failed on fp16 support lacking with sm_50 and sm_52, but i forgot to test with sm_53. Now i did and that compiles. So with current code and no way to disable FP16 support, sm_53 is the lowest compute capability that compiles. I haven't checked yet what this means for the other (older) jetson boards. EDIT: just checked, Jetson TK1 = 3.2, jetson TX1 == maxwell (5.3), jetson TX2 == pascal, Jetson Xavier == volta. So only the old TK1 would be a problem on the embedded side of things. So either separating and making optional of FP16 support or just requiring sm_53 in CMake would seem sensible ?

Running the tests with opencv compiled for only sm_53 on the pascal card on X86 (cuda 10.1 cudnn 7.6.2) gives no failing tests, so the compute capability doesn't seem to matter.

So it's either something in cudnn 7.5.0, nvidia CUDA / cudnn libraries on arm64, or something in the opencv cudadnn code on arm64. Which makes me wonder, what is so special about these failed models that they fail and all other models and test succeed on the Jetson Nano ? Any special layers ? Any special layers which are CPU only ?

tompollok commented 4 years ago

@YashasSamaga thank you for your great contribution! May I ask if you know if MaskRCNN will also work using your CUDA backend? I works using OpenCV backend. Will your CUDA/CUDNN backend autofallback layers on other backend if they arend supported? I added the pbtxt if youre interested what layers the network uses:

mask_rcnn_inception_v2_coco_2018_01_28.zip

YashasSamaga commented 4 years ago

@tompollok

The CUDA backend uses OpenCV CPU backend as a fallback for unsupported layers or layer configurations. The fallbacks are quite costly though.

I used your .pbtxt and the .pb from here.

Inception v2 Mask RCNN
OCV CPU Time:   3280ms
CUDA Total Time: 407ms
Relative Error >> Total: 0.787171, Average: 1.94363e-07, Max: 1.65757e-06

7700HQ and GTX 1050 were the devices used in the test. Every output value from the CUDA backed is compared against the corresponding output from the OpenCV CPU backend. The relative error is calculated as:

error = abs(x - y) / max(max(abs(x), abs(y)), eps) where eps = 1e-7

If you use the CUDA backend in a debug build, it will print the layers for which fallback is used. The CUDA backend uses OCV CPU fallback for the following layers:

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "_input" of type __NetInputLayer__

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "detection_out" of type DetectionOutput

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "CropAndResize" of type CropAndResize

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "detection_out_final" of type DetectionOutput

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "CropAndResize_1" of type CropAndResize

I will be adding CropAndResize layer and NetInputLayer soon which should bring down the inference time considerably. Adding support for the detection output layer is a bit tricky because of NMS. I have tried but haven't been able to beat the CPU version of NMS. I'll hopefully get it working someday.

applied-machinelearning commented 4 years ago

If you use the CUDA backend in a debug build, it will print the layers for which fallback is used.

I think that info would also be helpful for non-debug builds (users of released distro versions). Could it be an idea to amend the id of the backend used to getPerfProfile() (or have it's own function to get that info per layer) ?

alalek commented 4 years ago

info would also be helpful for non-debug builds

Environment variable OPENCV_LOG_LEVEL=INFO should help with messages in release builds.

tompollok commented 4 years ago

@tompollok

The CUDA backend uses OpenCV CPU backend as a fallback for unsupported layers or layer configurations. The fallbacks are quite costly though.

I used your .pbtxt and the .pb from here.
Inception v2 Mask RCNN
OCV CPU Time:   3280ms
CUDA Total Time: 407ms
Relative Error >> Total: 0.787171, Average: 1.94363e-07, Max: 1.65757e-06
7700HQ and GTX 1050 were the devices used in the test. Every output value from the CUDA backed is compared against the corresponding output from the OpenCV CPU backend. The relative error is calculated as:
error = abs(x - y) / max(max(abs(x), abs(y)), eps) where eps = 1e-7
If you use the CUDA backend in a debug build, it will print the layers for which fallback is used. The CUDA backend uses OCV CPU fallback for the following layers:
[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "_input" of type __NetInputLayer__

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "detection_out" of type DetectionOutput

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "CropAndResize" of type CropAndResize

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "detection_out_final" of type DetectionOutput

[ INFO:0] global E:\Repositories\opencv\modules\dnn\src\dnn.cpp (1820) cv::dnn::dnn4_v20190621::Net::Impl::initCUDABackend CUDA backend will fallback to the CPU implementation for the layer "CropAndResize_1" of type CropAndResize
I will be adding CropAndResize layer and NetInputLayer soon which should bring down the inference time considerably. Adding support for the detection output layer is a bit tricky because of NMS. I have tried but haven't been able to beat the CPU version of NMS. I'll hopefully get it working someday.

Thats great news! Is there a way to list the single or average or accumulated forwording time in ms per layer for when forwording to see where bottlenecks may occur?

YashasSamaga commented 4 years ago

Currently, there is no simple way. The numbers returned by getPerfProfile are not accurate for CUDA backend.

The backend has NVTX integration which allows you to compute the timings for each layer using NVIDIA's profiling tools. The NVTX integration marks regions of layers which allows you to exactly identify the time taken by the layers in NVIDIA's profiling tools.

The NVTX integration can be enabled by adding CUDA4DNN_ENABLE_NVTX preprocessor symbol while building the DNN module.

The timings can be computed for the layers within the code by adding events at the beginning and the end of every layer. This would allow getPerfProfile to return accurate timings. I need to investigate the performance impacts (mostly negligible) of using CUDA's event API for timing and then decide what to do.

YashasSamaga commented 4 years ago

forwardAsync for CUDA backend does not do what it does for IE.

For the CUDA backend, it dumps the operations to the device and returns immediately so that the calling thread can continue. It's not possible to call forwardAsync until the previous operations finish.

This overloads the meaning for forwardAsync. Any user can mimic this forwardAsync behaviour of CUDA backend on their own so I don't see why it should be a part of the CUDA backend.

I think I should revert https://github.com/opencv/opencv/pull/14827/commits/1154b9da9da07e9b52f8a81bdcea48cf31c56f70 or make it behave like IE forwardAsync (which isn't trivial).

davisking commented 4 years ago

forwardAsync for CUDA backend does not do what it does for IE.

For the CUDA backend, it dumps the operations to the device and returns immediately so that the calling thread can continue. It's not possible to call forwardAsync until the previous operations finish.

The calling thread can call forwardAsync again, but you mean that if they do that they will get the wrong results right?

This overloads the meaning for forwardAsync. Any user can mimic this forwardAsync behaviour of CUDA backend on their own so I don't see why it should be a part of the CUDA backend.

I think I should revert [1154b9d]

Assuming the above is all correct, reverting this and making forwardAsync just not supported for the CUDA backend seems like the right thing to do, since it's important for methods that claim to implement an interface (i.e. forwardAsync) to all actually conform to interface's contract.

YashasSamaga commented 4 years ago

@davisking Yes, if the calling thread calls forwardAsync before the previous request has been completed, they will get wrong results.

`getPerfProfile()`

I have tried using events to allow getPerfProfile to report timings accurately. AlexNet benchmark time shot up by 1ms (6ms from 5ms) when events were added at the beginning and the end of every layer. Hence, I think it's not a good idea to add it unless there is a way to optionally enable/disable profiling in the cv::Net interface.

Why is MobileNet slow?

CPU: 7700HQ GPU: GTX 1050 Mobile

The timings are in milliseconds.

Click for layerwise timings

LID | Layer Name | OCV CPU Time | CUDA Time --- | --------------------------| ------------ | -------------- 1 | conv0 | 0.7377 | 0.431744 2 | conv0/relu | fused | 0.111616 3 | conv1/dw | 3.6471 | 4.87014 4 | conv1/dw/relu | fused | 0.200704 5 | conv1 | 1.3591 | 0.4352 6 | conv1/relu | fused | 0.293888 7 | conv2/dw | 2.0382 | 1.29843 8 | conv2/dw/relu | fused | 0.169984 9 | conv2 | 0.8757 | 0.3072 10 | conv2/relu | fused | 0.193536 11 | conv3/dw | 3.6779 | 5.92691 12 | conv3/dw/relu | fused | 0.224256 13 | conv3 | 1.7081 | 0.396288 14 | conv3/relu | fused | 0.149504 15 | conv4/dw | 1.3985 | 1.60563 16 | conv4/dw/relu | fused | 0.151552 17 | conv4 | 1.0596 | 0.275456 18 | conv4/relu | fused | 0.164864 19 | conv5/dw | 2.2595 | 4.86093 20 | conv5/dw/relu | fused | 0.036864 21 | conv5 | 1.5585 | 0.229376 22 | conv5/relu | fused | 0.16384 23 | conv6/dw | 0.6994 | 2.93581 24 | conv6/dw/relu | fused | 0.01536 25 | conv6 | 0.4596 | 0.14336 26 | conv6/relu | fused | 0.13312 27 | conv7/dw | 1.3262 | 7.89914 28 | conv7/dw/relu | fused | 0.027648 29 | conv7 | 1.6013 | 0.254976 30 | conv7/relu | fused | 0.08272 31 | conv8/dw | 1.3864 | 8.4992 32 | conv8/dw/relu | fused | 0.027392 33 | conv8 | 1.6535 | 0.248832 34 | conv8/relu | fused | 0.140288 35 | conv9/dw | 1.2779 | 8.38554 36 | conv9/dw/relu | fused | 0.103424 37 | conv9 | 1.8505 | 0.320512 38 | conv9/relu | fused | 0.105472 39 | conv10/dw | 1.323 | 9.472 40 | conv10/dw/relu | fused | 0.123904 41 | conv10 | 1.7355 | 0.325632 42 | conv10/relu | fused | 0.116736 43 | conv11/dw | 1.5615 | 11.5087 44 | conv11/dw/relu | fused | 0.032768 45 | conv11 | 1.6546 | 0.240672 46 | conv11/relu | fused | 0.090112 47 | conv12/dw | 0.4947 | 11.1565 48 | conv12/dw/relu | fused | 0.036864 49 | conv12 | 1.1583 | 0.305152 50 | conv12/relu | fused | 0.088064 51 | conv13/dw | 0.9613 | 20.4667 52 | conv13/dw/relu | fused | 0.096256 53 | conv13 | 2.6287 | 0.557056 54 | conv13/relu | fused | 0.104448 55 | conv14_1 | 0.8994 | 0.310272 56 | conv14_1/relu | fused | 0.116736 57 | conv14_2 | 1.1877 | 0.355328 58 | conv14_2/relu | fused | 0.115712 59 | conv15_1 | 0.1326 | 0.208896 60 | conv15_1/relu | fused | 0.114688 61 | conv15_2 | 0.2258 | 0.267264 62 | conv15_2/relu | fused | 0.064512 63 | conv16_1 | 0.0521 | 0.186368 64 | conv16_1/relu | fused | 0.07168 65 | conv16_2 | 0.1443 | 0.229376 66 | conv16_2/relu | fused | 0.116736 67 | conv17_1 | 0.0238 | 0.216064 68 | conv17_1/relu | fused | 0.113664 69 | conv17_2 | 0.0444 | 0.128 70 | conv17_2/relu | fused | 0.116736 71 | conv11_mbox_loc | 0.1258 | 0.201728 72 | conv11_mbox_loc_perm | 0.0086 | 0.109568 73 | conv11_mbox_loc_flat | 0.003 | 0.094208 74 | conv11_mbox_conf | 0.3093 | 0.2048 75 | conv11_mbox_conf_perm | 0.0136 | 0.095232 76 | conv11_mbox_conf_flat | 0.002 | 0.101376 77 | conv11_mbox_priorbox | 0.0167 | 0.089088 78 | conv13_mbox_loc | 0.1002 | 0.265216 79 | conv13_mbox_loc_perm | 0.0061 | 0.110592 80 | conv13_mbox_loc_flat | 0.0016 | 0.070656 81 | conv13_mbox_conf | 0.3808 | 0.236544 82 | conv13_mbox_conf_perm | 0.0113 | 0.053248 83 | conv13_mbox_conf_flat | 0.0021 | 0.093184 84 | conv13_mbox_priorbox | 0.0102 | 0.101376 85 | conv14_2_mbox_loc | 0.0339 | 0.144384 86 | conv14_2_mbox_loc_perm | 0.0055 | 0.055296 87 | conv14_2_mbox_loc_flat | 0.0014 | 0.057344 88 | conv14_2_mbox_conf | 0.0987 | 0.147456 89 | conv14_2_mbox_conf_perm | 0.0068 | 0.054272 90 | conv14_2_mbox_conf_flat | 0.0016 | 0.057344 91 | conv14_2_mbox_priorbox | 0.0043 | 0.03376 92 | conv15_2_mbox_loc | 0.0194 | 0.079872 93 | conv15_2_mbox_loc_perm | 0.0036 | 0.05216 94 | conv15_2_mbox_loc_flat | 0.0012 | 0.06144 95 | conv15_2_mbox_conf | 0.0422 | 0.094208 96 | conv15_2_mbox_conf_perm | 0.0079 | 0.050176 97 | conv15_2_mbox_conf_flat | 0.0038 | 0.060768 98 | conv15_2_mbox_priorbox | 0.0053 | 0.065312 99 | conv16_2_mbox_loc | 0.0185 | 0.11776 100 | conv16_2_mbox_loc_perm | 0.0075 | 0.06144 101 | conv16_2_mbox_loc_flat | 0.0021 | 0.06656 102 | conv16_2_mbox_conf | 0.0321 | 0.117504 103 | conv16_2_mbox_conf_perm | 0.0066 | 0.050176 104 | conv16_2_mbox_conf_flat | 0.0019 | 0.048 105 | conv16_2_mbox_priorbox | 0.0037 | 0.055232 106 | conv17_2_mbox_loc | 0.0109 | 0.128 107 | conv17_2_mbox_loc_perm | 0.0055 | 0.059072 108 | conv17_2_mbox_loc_flat | 0.0016 | 0.090112 109 | conv17_2_mbox_conf | 0.0184 | 0.131072 110 | conv17_2_mbox_conf_perm | 0.0048 | 0.098304 111 | conv17_2_mbox_conf_flat | 0.0018 | 0.078848 112 | conv17_2_mbox_priorbox | 0.0027 | 0.075648 113 | mbox_loc | 0.0135 | 0.094112 114 | mbox_conf | 0.0244 | 0.060416 115 | mbox_priorbox | 0.0159 | 0.077824 116 | mbox_conf_reshape | 0.0021 | 0.078848 117 | mbox_conf_softmax | 1.0677 | 0.304128 118 | mbox_conf_flatten | 0.0044 | 0.0768 119 | detection_out | 1.1011 | 3.1438

Depthwise convolutions in cuDNN are insanely slow. So bad that the CPU takes 0.9ms and the GPU takes 20ms in conv13/dw. I have seen it launching huge number of kernels. I suspect it launches one kernel per group which is inefficient.

DetectionOutput is slow because it's performed on CPU which requires the data to be transfered from the GPU to the CPU.

The priorbox, concat and permute operations are slower on the GPU because the operands for those operations are too small. Many of these operations use just a single digit or two digit number of cores even though my GPU has 768 cores.

/cc @catree might be relevant as you were interested to know why MobileNet performs so badly

pfeatherstone commented 4 years ago

i'm getting the following build errors:

/home/peter/Downloads/opencv/modules/dnn/src/cuda/math.hpp(23): error: identifier "hexp" is undefined

/home/peter/Downloads/opencv/modules/dnn/src/cuda/math.hpp(24): error: identifier "h2exp" is undefined

/home/peter/Downloads/opencv/modules/dnn/src/cuda/math.hpp(29): error: identifier "hexp" is undefined

/home/peter/Downloads/opencv/modules/dnn/src/cuda/math.hpp(30): error: identifier "h2exp" is undefined

/home/peter/Downloads/opencv/modules/dnn/src/cuda/math.hpp(53): error: identifier "hlog" is undefined

/home/peter/Downloads/opencv/modules/dnn/src/cuda/math.hpp(54): error: identifier "h2log" is undefined

/home/peter/Downloads/opencv/modules/dnn/src/cuda/math.hpp(59): error: more than one conversion function from "half" to a built-in type applies: function "half::operator float() const" function "half::operator short() const" function "half::operator unsigned short() const" function "half::operator int() const" function "half::operator unsigned int() const" function "half::operator long long() const" function "half::operator unsigned long long() const" function "half::operator nv_bool() const"

YashasSamaga commented 4 years ago

@pfeatherstone Can you upload your CMakeCache.txt? What device do you have?

pfeatherstone commented 4 years ago

@YashasSamaga I'm running Ubuntu18, cuda 10.1, cudnn 7 and i have two titan X. CMakeCache.txt

YashasSamaga commented 4 years ago

The CUDA backend provides a half-precision (DNN_TARGET_CUDA_FP16) target to further acclerate DNN inference. It makes use of half-precision intrinsics which are supported in devices with CC 5.3 and above only.

@pfeatherstone Your GPU's compute capability is 6.1 which is good enough. (Source: https://developer.nvidia.com/cuda-gpus#compute)

In your CMakeCache.txt, the CUDA_ARCH_BIN option appears to have been set to 3.0 3.5 3.7 5.0 5.2 6.0 6.1 7.0 7.5 (which I think is the default). Please change this option to 6.1 or a list of your choice where all architectures are CC 5.3+.

After building you can try running the tests:

[build dir]/bin/opencv_test_dnn
[build dir]/bin/opencv_perf_dnn

[~~TODO~~ DONE] throw an error while configuring CMake for unsupported compute capabilities

This CC limitation is only due to the half-precision support. It's possible to have a build option to enable or disable half-precision support in the CUDA backend. But I wonder if this is of any use. I don't think people still use very old GPUs and hence such an option might not be very useful.

Avrohom commented 4 years ago

Hi @YashasSamaga ,

Thanks for your reply earlier on. I have changed CUDA_ARCH_BIN as per your suggestion. I am trying to compile on windows 10. Still getting errors.

C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\gapi\src\compiler\gcompiler.cpp(115,1) : error C2512 : 'cv::gapi::GNetPackage': no appropriate default constructor available [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda 4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\gapi\src\compiler/passes/passes.hpp(31,12) : message : see declaration of 'cv::gapi::GNetPackage' (compiling source file C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\modules\gapi\src\compiler\gcompiler.cpp) [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\mod ules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(280,1) : error C2589: '(': illegal t oken on right side of '::' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world .vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(281,1) : error C2062: type 'unknown- type' unexpected [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(281,1): error C2059 : syntax error : ')' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(590,38) : error C2589: '(': illegal token on right side of '::' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_worl d.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(590,1) : error C2062: type 'unknown- type' unexpected [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(590,1): error C2059 : syntax error : ')' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(592,52): error C2059 : syntax error : ';' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(595,17) : error C2065: 'singleMean': undeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcx proj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(597,61) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(597,1) : error C2109: subscript requ ires array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world .vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(597,79) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(606,52) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(606,1) : error C2109: subscript requ ires array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world .vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(606,70) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(634,38) : error C2589: '(': illegal token on right side of '::' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_worl d.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(634,1) : error C2062: type 'unknown- type' unexpected [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(634,1): error C2059 : syntax error : ')' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(636,52): error C2059 : syntax error : ';' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(641,21) : error C2065: 'singleMean': undeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcx proj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(643,51) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(643,1) : error C2109: subscript requ ires array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world .vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(658,55) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(658,1) : error C2109: subscript requ ires array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world .vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(666,21) : error C2065: 'singleMean': undeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcx proj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(668,65) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(668,1) : error C2109: subscript requ ires array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world .vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(668,83) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(682,56) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(682,1) : error C2109: subscript requ ires array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world .vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(682,74) : error C2065: 'scale': unde clared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(687,9): error C2059 : syntax error : 'return' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(772,37): error C4430 : missing type specifier - int assumed. message : C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(772,37): error C2143 : syntax error : missing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(781,42) : error C2065: 'LayerPin': u ndeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpr oj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(781,35) : error C2923: 'std::vector' : 'LayerPin' is not a valid template type argument for parameter '_Ty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4 dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\vector(352,37) : error C3 203: 'allocator': unspecialized class template can't be used as a template argument for template parameter '_Alloc', ex pected a real type (compiling source file C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dn n.cpp) [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(791,37): error C4430 : missing type specifier - int assumed. message : C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(791,37): error C2143 : syntax error : missing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(803,30): error C4430 : missing type specifier - int assumed. message : C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(803,30): error C2143 : syntax error : missing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(823,41): error C4430 : missing type specifier - int assumed. message : C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(823,41): error C2143 : syntax error : missing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(834,46) : error C2065: 'LayerPin': u ndeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpr oj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(834,39) : error C2923: 'std::vector' : 'LayerPin' is not a valid template type argument for parameter '_Ty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4 dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(842,38): error C4430 : missing type specifier - int assumed. message : C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(842,38): error C2143 : syntax error : missing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(888,42): error C2061 : syntax error : identifier 'LayerData' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(979,32): error C4430 : missing type specifier - int assumed. message : C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(979,32): error C2143 : syntax error : missing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(986,14) : error C2065: 'LayerPin': u ndeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpr oj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(986,10) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dn n-csl-low\build\modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\map(70,46) : error C3203: 'less': unspecialized class template can't be used as a template argument for template parameter '_Pr', expected a rea l type (compiling source file C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp) [C:\U sers\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\map(70,83) : error C3203: 'pair': unspecialized class template can't be used as a template argument for template parameter '_Ty', expected a rea l type (compiling source file C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp) [C:\U sers\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(989,14) : error C2065: 'LayerPin': u ndeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpr oj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(989,24) : error C2065: 'LayerPin': u ndeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpr oj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(989,10) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dn n-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(989,10) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Ty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(990,14) : error C2065: 'LayerPin': u ndeclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpr oj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(990,10) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dn n-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(774,14) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dn n-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(772) : message : see declaration of 'LayerPin' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(774,34) : error C2955: 'std::map': u se of class template requires template argument list [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\ modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\map(71) : message : see d eclaration of 'std::map' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(774,64) : error C2065: 'lp': undecla red identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(775,29) : error C2663: 'std::_Tree<_ Traits>::end': 2 overloads have no legal conversion for 'this' pointer [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4 dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(776,24) : error C2065: 'lp': undecla red identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(783,1) : error C2662: 'allocator_tra its::rebind_alloc<_Ty>>::size_type std::vector<_Ty,_Alloc>::size(void) noexcept const': cannot convert 'this' pointer from 'const std::vector' to 'const std::vector<_Ty,_Alloc> &' [C:\Users\avrsi\source\repos\Open Cv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(783,1) : message : Reason: cannot co nvert from 'const std::vector' to 'const std::vector<_Ty,_Alloc>' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-c sl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(783,29) : message : Conversion requi res a second user-defined-conversion operator or constructor [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-lo w\build\modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\vector(1436,26) : message : see declaration of 'std::vector<_Ty,_Alloc>::size' (compiling source file C:\Users\avrsi\source\repos\OpenCv\opencv- cuda4dnn-csl-low\modules\dnn\src\dnn.cpp) [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\wor ld\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(785,1) : error C2678: binary '[': no operator found which takes a left-hand operand of type 'const std::vector' (or there is no acceptable conversion) [C:\ Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\vector(1461,27) : message : could be 'const _Ty &std::vector<_Ty,_Alloc>::operator [](const allocator_traits::rebind_al loc<_Ty>>::size_type) noexcept const' (compiling source file C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low \modules\dnn\src\dnn.cpp) [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world. vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\vector(1451,21) : message : or '_Ty &std::vector<_Ty,_Alloc>::operator [](const allocator_traits::rebind_alloc<_T y>>::size_type) noexcept' (compiling source file C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn \src\dnn.cpp) [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(785,1) : message : while trying to m atch the argument list '(const std::vector, int)' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\mod ules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(793,14) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dn n-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(791) : message : see declaration of 'LayerPin' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(793,14) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Ty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn -csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(791) : message : see declaration of 'LayerPin' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(793,39) : error C2955: 'std::map': u se of class template requires template argument list [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\ modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\map(71) : message : see d eclaration of 'std::map' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(793,70) : error C2065: 'lp': undecla red identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(794,9) : error C2663: 'std::_Tree<_T raits>::end': 2 overloads have no legal conversion for 'this' pointer [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4d nn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(795,18): error C2146 : syntax error : missing ';' before identifier 'memHost' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\wor ld\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(795,18) : error C2065: 'memHost': un declared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(797,14) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dn n-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(791) : message : see declaration of 'LayerPin' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(797,34) : error C2955: 'std::map': u se of class template requires template argument list [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\ modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28008\include\map(71) : message : see d eclaration of 'std::map' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.v cxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(797,67) : error C2065: 'memHost': un declared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(798,9) : error C2663: 'std::_Tree<_T raits>::end': 2 overloads have no legal conversion for 'this' pointer [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4d nn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(805,9) : error C2065: 'user': undecl ared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(805,9) : error C2663: 'std::_Tree<_T raits>::end': 2 overloads have no legal conversion for 'this' pointer [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4d nn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(806,9) : error C2065: 'host': undecl ared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(806,9) : error C2663: 'std::_Tree<_T raits>::end': 2 overloads have no legal conversion for 'this' pointer [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4d nn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(807,18): error C2146 : syntax error : missing ';' before identifier 'memHost' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\wor ld\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(807,18) : error C2065: 'memHost': un declared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(807,37) : error C2065: 'host': undec lared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(808,18) : error C2065: 'user': undec lared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(808,26) : error C2065: 'memHost': un declared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(809,29) : error C2065: 'memHost': un declared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(809,51) : error C2663: 'std::_Tree<_ Traits>::end': 2 overloads have no legal conversion for 'this' pointer [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4 dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(811,18) : error C2923: 'std::map': ' LayerPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dn n-csl-low\build\modules\world\opencv_world.vcxproj]

YashasSamaga commented 4 years ago

@Avrohom Can you try building once again from a fresh clone? You seem to have errors from many modules.

I am able to build on my PC with VS17. What compiler are you using?

Avrohom commented 4 years ago

Hi @YashasSamaga,

Many thanks for your kind help. I will do a fresh clone. I was using VS19. Used it to successfuly build the official OpenCv Master.

Avrohom commented 4 years ago

@YashasSamaga !

Nope. Not working. Did a fresh clone. Tried with VS2017.

C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(280): error C2589: '(': illegal toke n on right side of '::' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(281): error C2062: type 'unknown-typ e' unexpected [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(281): error C2059: syntax error: ')' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(590): error C2589: '(': illegal toke n on right side of '::' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(590): error C2062: type 'unknown-typ e' unexpected [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(590): error C2059: syntax error: ')' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(592): error C2059: syntax error: ';' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(595): error C2065: 'singleMean': und eclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj ] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(597): error C2065: 'scale': undeclar ed identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(597): error C2109: subscript require s array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(606): error C2065: 'scale': undeclar ed identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(606): error C2109: subscript require s array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(634): error C2589: '(': illegal toke n on right side of '::' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(634): error C2062: type 'unknown-typ e' unexpected [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(634): error C2059: syntax error: ')' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(636): error C2059: syntax error: ';' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(641): error C2065: 'singleMean': und eclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj ] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(643): error C2065: 'scale': undeclar ed identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(643): error C2109: subscript require s array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(658): error C2065: 'scale': undeclar ed identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(658): error C2109: subscript require s array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(666): error C2065: 'singleMean': und eclared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj ] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(668): error C2065: 'scale': undeclar ed identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(668): error C2109: subscript require s array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(682): error C2065: 'scale': undeclar ed identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(682): error C2109: subscript require s array or pointer type [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vc xproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(687): error C2059: syntax error: 're turn' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(772): error C4430: missing type spec ifier - int assumed. Note: C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low \build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(772): error C2143: syntax error: mis sing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(781): error C2065: 'LayerPin': undec lared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(781): error C2923: 'std::vector': 'L ayerPin' is not a valid template type argument for parameter '_Ty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn- csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(781): error C3203: 'allocator': unsp ecialized class template can't be used as a template argument for template parameter '_Alloc', expected a real type [C: \Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(791): error C4430: missing type spec ifier - int assumed. Note: C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low \build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(791): error C2143: syntax error: mis sing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(803): error C4430: missing type spec ifier - int assumed. Note: C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low \build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(803): error C2143: syntax error: mis sing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(823): error C4430: missing type spec ifier - int assumed. Note: C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low \build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(823): error C2143: syntax error: mis sing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(834): error C2065: 'LayerPin': undec lared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(834): error C2923: 'std::vector': 'L ayerPin' is not a valid template type argument for parameter '_Ty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn- csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(834): error C3203: 'allocator': unsp ecialized class template can't be used as a template argument for template parameter '_Alloc', expected a real type [C: \Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(842): error C4430: missing type spec ifier - int assumed. Note: C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low \build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(842): error C2143: syntax error: mis sing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(888): error C2061: syntax error: ide ntifier 'LayerData' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(979): error C4430: missing type spec ifier - int assumed. Note: C++ does not support default-int [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low \build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(979): error C2143: syntax error: mis sing ',' before '&' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxpro j] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(986): error C2065: 'LayerPin': undec lared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(986): error C2923: 'std::map': 'Laye rPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-cs l-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(986): error C3203: 'less': unspecial ized class template can't be used as a template argument for template parameter '_Pr', expected a real type [C:\Users\a vrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\include\map(79): error C3203: 'pair': unspecialized class template can't be used as a template argument for template parameter '_Ty', expected a real type (compiling source file C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp) [C:\Us ers\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(989): error C2065: 'LayerPin': undec lared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(989): error C2923: 'std::map': 'Laye rPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-cs l-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(989): error C2923: 'std::map': 'Laye rPin' is not a valid template type argument for parameter '_Ty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl -low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(989): error C3203: 'less': unspecial ized class template can't be used as a template argument for template parameter '_Pr', expected a real type [C:\Users\a vrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(990): error C2065: 'LayerPin': undec lared identifier [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(990): error C2923: 'std::map': 'Laye rPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-cs l-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(990): error C3203: 'less': unspecial ized class template can't be used as a template argument for template parameter '_Pr', expected a real type [C:\Users\a vrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(774): error C2923: 'std::map': 'Laye rPin' is not a valid template type argument for parameter '_Kty' [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-cs l-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(772): note: see declaration of 'La yerPin' C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(774): error C3203: 'less': unspecial ized class template can't be used as a template argument for template parameter '_Pr', expected a real type [C:\Users\a vrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modules\world\opencv_world.vcxproj] C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\modules\dnn\src\dnn.cpp(774): error C2955: 'std::map': use o f class template requires template argument list [C:\Users\avrsi\source\repos\OpenCv\opencv-cuda4dnn-csl-low\build\modu les\world\opencv_world.vcxproj]

alalek commented 4 years ago

@Avrohom

opencv_world.vcxproj

Why are you building in opencv_world mode? CUDA is not well supported in this mode.

Avrohom commented 4 years ago

@alalek. Well, I managed to get the official Master branch built in opencv_world mode. It included CUDA.

alalek commented 4 years ago

Looks like it blames on std::max - common windows problem: https://stackoverflow.com/questions/13416418/define-nominmax-using-stdmin-max/13420838 Try to add #define NOMINMAX somewhere (precomp.hpp or CMake via add_definitions(/DNOMINMAX))

Avrohom commented 4 years ago

Hi @alalek, @YashasSamaga ,

Many thanks,

Yes, I had to repair that std::max issue. Didn't knew where to #define NOMINMAX though, (haven't noticed your reply than), so, I basically changed it to (std::max)(...).

Also, did manually change the code in modules/gapi/include/opencv2/gapi/infer.hpp to apply the fix as described at https://github.com/opencv/opencv/commit/212f0fb5093ff8353cce602084d60225061f79f1#diff-e56cd60011ddca49858f11ce28fd3c31.

I am working with the 'opencv-cuda4dnn-csl-low' branch. Because it looked to me to be the most up to date. Am I missing something? To which branch has the above fix been merged to?

I can confirm now that I have been able to build that repository successfully on windows including the opencv_world option.

Interesting though, that when testing that build that has been compiled in debug mode for Mask R CNN inference, it takes approx 1,150ms per frame to process and I do get a few messages about the library 'falling back' onto CPU for various operations, whereas when running the Release version, I do not get any of those 'fallback' messages and processing takes +-350ms / frame. Such a big difference is quite strange.

YashasSamaga commented 4 years ago

@Avrohom

I have run into the std::max/std::min issue after rebasing. I think a fix for MSVC should be added in cmake globally.

The debug build prevents many compiler optimizations which would improve the inference time. IMO, such a large difference in timings is normal.

The fallback messages are not displayed in release builds but the fallbacks are used. Refer to this comment to enable the messages in non-debug builds.

davisking commented 4 years ago

What are the next steps on this PR? I see we are waiting for reviews from @alalek and @dkurt. Is there anything holding up those reviews yet to be completed by @YashasSamaga?

opencv / opencv

CUDA backend for the DNN module #14827

Known issues:

if (CUDA_ARCH >= 530 || !defined(CUDA_ARCH)) && !defined(__CUDA_NO_HALF2_OPERATORS__)

Backend Comparision

Batch Size = 1

Batch Size = 10

OpenCV OpenCL vs CUDA FP32 (on NVIDIA GPU)

Batch Size = 1

Integrated Graphics (IG) vs NVIDIA (using OpenCV backend's OpenCL target)

Batch Size = 1

GTX 1080 Ti Benchmark

Backend Comparision

Batch Size = 1

Batch Size = 10

Batch Size = 128

Images processed per second (CUDA FP32)

OpenCV CUDA vs TensorFlow

Batch Size 1

Batch Size 10

Batch Size 128

`getPerfProfile()`

Why is MobileNet slow?