wanglimin / dense_flow

OpenCV Implementation of different optical flow algorithms
231 stars 202 forks source link

Gpu API call (invalid device function) in call #6

Closed geekvc closed 8 years ago

geekvc commented 8 years ago

I use the tool to get test image from my test.avi, and follow the usage

./denseFlow_gpu -f test.avi -x tmp/flow_x -y tmp/flow_x -i tmp/image -b 20 -t 1 -d 0 -s 1

and get the error

OpenCV Error: Gpu API call (invalid device function) in call, file /home/uuz/Downloads/opencv/Install-OpenCV-master/Ubuntu/2.4/OpenCV/opencv-2.4.10/modules/gpu/include/opencv2/gpu/device/detail/transform_detail.hpp, line 361
terminate called after throwing an instance of 'cv::Exception'
  what():  /home/uuz/Downloads/opencv/Install-OpenCV-master/Ubuntu/2.4/OpenCV/opencv-2.4.10/modules/gpu/include/opencv2/gpu/device/detail/transform_detail.hpp:361: error: (-217) invalid device function in function call

[1]    16872 abort (core dumped)  ./denseFlow_gpu -f test.avi -x tmp/flow_x -y tmp/flow_x -i tmp/image -b 20 -t

after that I add

set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -arch compute_35 -code sm_35)

to the CmakeLists.txt and make again, the error is the same as above, I do not know how to solve it. Thank you in advance!

wanglimin commented 8 years ago

I did encounter this issue before either. It seems that there is problem with your GPU architecture and driver. I found a similar problem report in the following website:

https://bitbucket.org/rodrigob/doppia/issues/85/opencv-error-gpu-api-call-invalid-device

Perhaps, you could try this.

geekvc commented 8 years ago

thank you very much! I tried the code and got this:

CUDA Device Query...
There are 4 CUDA devices.

CUDA Device #0
Major revision number:         3
Minor revision number:         5
Name:                          Tesla K40c
Total global memory:           4294770688
Total shared memory per block: 49152
Total registers per block:     65536
Warp size:                     32
Maximum memory pitch:          2147483647
Maximum threads per block:     1024
Maximum dimension 0 of block:  1024
Maximum dimension 1 of block:  1024
Maximum dimension 2 of block:  64
Maximum dimension 0 of grid:   2147483647
Maximum dimension 1 of grid:   65535
Maximum dimension 2 of grid:   65535
Clock rate:                    875500
Total constant memory:         65536
Texture alignment:             512
Concurrent copy and execution: Yes
Number of multiprocessors:     15
Kernel execution timeout:      No

CUDA Device #1
Major revision number:         3
Minor revision number:         5
Name:                          Tesla K40c
Total global memory:           4294770688
Total shared memory per block: 49152
Total registers per block:     65536
Warp size:                     32
Maximum memory pitch:          2147483647
Maximum threads per block:     1024
Maximum dimension 0 of block:  1024
Maximum dimension 1 of block:  1024
Maximum dimension 2 of block:  64
Maximum dimension 0 of grid:   2147483647
Maximum dimension 1 of grid:   65535
Maximum dimension 2 of grid:   65535
Clock rate:                    875500
Total constant memory:         65536
Texture alignment:             512
Concurrent copy and execution: Yes
Number of multiprocessors:     15
Kernel execution timeout:      No

CUDA Device #2
Major revision number:         3
Minor revision number:         5
Name:                          Tesla K40c
Total global memory:           4294770688
Total shared memory per block: 49152
Total registers per block:     65536
Warp size:                     32
Maximum memory pitch:          2147483647
Maximum threads per block:     1024
Maximum dimension 0 of block:  1024
Maximum dimension 1 of block:  1024
Maximum dimension 2 of block:  64
Maximum dimension 0 of grid:   2147483647
Maximum dimension 1 of grid:   65535
Maximum dimension 2 of grid:   65535
Clock rate:                    875500
Total constant memory:         65536
Texture alignment:             512
Concurrent copy and execution: Yes
Number of multiprocessors:     15
Kernel execution timeout:      No

CUDA Device #3
Major revision number:         3
Minor revision number:         5
Name:                          Tesla K40c
Total global memory:           4294770688
Total shared memory per block: 49152
Total registers per block:     65536
Warp size:                     32
Maximum memory pitch:          2147483647
Maximum threads per block:     1024
Maximum dimension 0 of block:  1024
Maximum dimension 1 of block:  1024
Maximum dimension 2 of block:  64
Maximum dimension 0 of grid:   2147483647
Maximum dimension 1 of grid:   65535
Maximum dimension 2 of grid:   65535
Clock rate:                    875500
Total constant memory:         65536
Texture alignment:             512
Concurrent copy and execution: Yes
Number of multiprocessors:     15
Kernel execution timeout:      No

and set at the CMakeLists.txt: set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -arch compute_35 -code sm_35) and make clean make error still the same as above.

wanglimin commented 8 years ago

Maybe you could find other solutions on Google. I did not encounter this problem before.

geekvc commented 8 years ago

Thank you all the same! I am trying it on other type GPU.

geekvc commented 8 years ago

On the Tesla K20, the error disappeared. Thank you.

KnightOfTheMoonlight commented 7 years ago

I have solved this problem by reinstalling cuda with xxx.deb file.

fanser commented 7 years ago

@KnightOfTheMoonlight I also meet the same issue. OpenCV Error: Gpu API call (invalid device function) in call, file /home/fzy/install/opencv-2.4.10/modules/gpu/include/opencv2/gpu/device/detail/transform_detail.hpp, line 361 terminate called after throwing an instance of 'cv::Exception' what(): /home/fzy/install/opencv-2.4.10/modules/gpu/include/opencv2/gpu/device/detail/transform_detail.hpp:361: error: (-217) invalid device function in function call So how to solve the problem without changing the GPU. I use CUDA 8.0 + opencv 2.4.10

fanser commented 7 years ago

I figure this problem like @geekvc saying. Add this line to CMakeList.txt file set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -arch compute_52 -code sm_52) because my compute arch is 52. then delete the make files by runing rm -r ./build to ensure no cmake cache file exist ( @geekvc didn't work, maybe he don't delete all cmake cache file) make sudo make install then it works!

shamoqianting commented 6 years ago

I encounter the similar problem. OpenCV Error: Gpu API call (unknown error) in mallocPitch, file /data1/temporal-segment-networks-master/3rd-party/opencv-2.4.13/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp, line 1134 terminate called after throwing an instance of 'cv::Exception' what(): /data1/temporal-segment-networks-master/3rd-party/opencv-2.4.13/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp:1134: error: (-217) unknown error in function mallocPitch

Though I delete the build folder and add the line set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -arch compute_37 -code sm_37) to CMakeList.txt file, the problem is still there.

Does anyone solve this without reinstalling cuda?

shamoqianting commented 6 years ago

@KnightOfTheMoonlight what deb file do you use ? could you describe more details ? Thank you very much.

pengxiaoxiao commented 5 years ago

I use the tool to get test image from my test.avi, and follow the usage

./denseFlow_gpu -f test.avi -x tmp/flow_x -y tmp/flow_x -i tmp/image -b 20 -t 1 -d 0 -s 1

and get the error

OpenCV Error: Gpu API call (invalid device function) in call, file /home/uuz/Downloads/opencv/Install-OpenCV-master/Ubuntu/2.4/OpenCV/opencv-2.4.10/modules/gpu/include/opencv2/gpu/device/detail/transform_detail.hpp, line 361
terminate called after throwing an instance of 'cv::Exception'
  what():  /home/uuz/Downloads/opencv/Install-OpenCV-master/Ubuntu/2.4/OpenCV/opencv-2.4.10/modules/gpu/include/opencv2/gpu/device/detail/transform_detail.hpp:361: error: (-217) invalid device function in function call

[1]    16872 abort (core dumped)  ./denseFlow_gpu -f test.avi -x tmp/flow_x -y tmp/flow_x -i tmp/image -b 20 -t

after that I add

set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -arch compute_35 -code sm_35)

to the CmakeLists.txt and make again, the error is the same as above, I do not know how to solve it. Thank you in advance!

I have met the same program, how to solve it my device is:GeForce GTX 1080 Ti/PCIe/SSE2 cuda 9.0

pengxiaoxiao commented 5 years ago

I code in makeList "set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -arch compute_61 -code sm_61)"

sucaohan commented 5 years ago

@pengxiaoxiao 请问您解决了么,我现在也遇到了这个问题阿,cuda9版本

pengxiaoxiao commented 5 years ago

重装cuda8

发自我的iPhone

------------------ Original ------------------ From: sucaohan notifications@github.com Date: Fri,Jun 7,2019 1:03 PM To: wanglimin/dense_flow dense_flow@noreply.github.com Cc: shawxiao 2804597917@qq.com, Mention mention@noreply.github.com Subject: Re: [wanglimin/dense_flow] Gpu API call (invalid device function) in call (#6)

sucaohan commented 5 years ago

@pengxiaoxiao 但是一请问个ubuntu系统可以装两个cuda么,之前电脑装了很多东西,cuda9不让卸载

pengxiaoxiao commented 5 years ago

不行吧!

发自我的iPhone

------------------ Original ------------------ From: sucaohan notifications@github.com Date: Fri,Jun 7,2019 1:06 PM To: wanglimin/dense_flow dense_flow@noreply.github.com Cc: shawxiao 2804597917@qq.com, Mention mention@noreply.github.com Subject: Re: [wanglimin/dense_flow] Gpu API call (invalid device function) in call (#6)

@pengxiaoxiao 但是一请问个ubuntu系统可以装两个cuda么,之前电脑装了很多东西,cuda9不让卸载

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

sucaohan commented 5 years ago

@pengxiaoxiao 好的呢,谢谢

ZWJ-here commented 4 years ago

计算力不匹配的问题 未设置前opencv在cmake的输出是这样的:计算力是30 35 37 NVIDIA CUDA Use CUFFT: YES Use CUBLAS: YES USE NVCUVID:NO NVIDIA GPU arch: 30 35 37 NVIDIA PTX archs: Use fast math:NO

Note:6.1为GTX1080的计算能力,不同显卡需要根据自己的计算能力进行修改

查询显卡计算能力,可以通过运行cuda samples中的deviceQuery得知。

(文件夹NVIDIA_CUDA-_Samples下编译示例, 为版本号)

如果设置成功,cmake界面会有如下显示(我的显卡是1080ti): NVIDIA CUDA Use CUFFT: YES Use CUBLAS: YES USE NVCUVID:NO NVIDIA GPU arch: 61 NVIDIA PTX archs:61 Use fast math:NO

GPU arch/PTX archs都被设置为6.1 但如果运气不佳,添加编译选项并不能解决问题。 这时候需要修改opencv中关于CUDA计算能力这部分的配置文件./cmake/OpenCVDetectCUDA.cmake。 在 set(CUDA_ARCH_BIN ${cuda_arch_bin} CACHE STRING "Specify 'real' GPU architectures to build binaries for, BIN(PTX) format is supported") set(CUDA_ARCH_PTX ${cuda_arch_ptx} CACHE STRING "Specify 'virtual' PTX architectures to build PTX intermediate code for") 之前添加 set(cuda_arch_bin "6.1") set(cuda_arch_ptx "6.1") 保存后cmake上面那一段,重新将opencv cmake make make install一遍出现正确的计算能力显示61

最后重新编译dense_flow