failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered

jlyw1017 commented 5 years ago

Important - read before submitting

Please read the guidelines for contributing before submitting this issue!

Please do not post installation, build, usage, or modeling questions, or other requests for help to Issues. Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe.

Issue summary

I0203 19:40:28.183979 27526 net.cpp:257] Network initialization done. I0203 19:40:28.292917 27526 net.cpp:746] Ignoring source layer ImageDispData I0203 19:40:28.342500 27526 net.cpp:746] Ignoring source layer depth_conv6_disp_depth_conv6_disp_0_split I0203 19:40:28.342512 27526 net.cpp:746] Ignoring source layer disp_label_gather I0203 19:40:28.342530 27526 net.cpp:746] Ignoring source layer disp_label_gather_disp_label_gather_0_split I0203 19:40:28.342535 27526 net.cpp:746] Ignoring source layer disp_data_gather I0203 19:40:28.342536 27526 net.cpp:746] Ignoring source layer disp_data_gather_disp_data_gather_0_split I0203 19:40:28.342540 27526 net.cpp:746] Ignoring source layer disp_loss I0203 19:40:28.342542 27526 net.cpp:746] Ignoring source layer disp_accuracy I0203 19:40:28.342545 27526 net.cpp:746] Ignoring source layer disp_delta I0203 19:40:28.342547 27526 net.cpp:746] Ignoring source layer disp_delta_gather I0203 19:40:28.342550 27526 net.cpp:746] Ignoring source layer smooth_loss Processing 000000_10.png

F0203 19:40:29.342653 27526 math_functions.cu:79] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered using get_disp.py meet this error if make runtest , found cudnn_conv isn't there

Steps to reproduce

Tried solutions

System configuration

Operating system: ubuntu16.04
Compiler:
CUDA version : 9.0
CUDNN version (if applicable): 7.4.1
BLAS:
Python version (if using pycaffe): 2.7.12

Issue checklist

[ ] read the guidelines and removed the first paragraph
[ ] written a short summary and detailed steps to reproduce
[ ] explained how solutions to related problems failed (tick if found none)
[ ] filled system configuration
[ ] attached relevant logs/config files (tick if not applicable)

yangguorun commented 5 years ago

I also met the similar problem when using the new version of CUDA and CUDNN. If you use an older version, this error may be fixed. This error is from correlation layer. I will try to fix the bug after my vacation. Thank you.

jlyw1017 commented 5 years ago

Ok thank you! 新春快乐！

jianrui1 commented 5 years ago

Important - read before submitting

Please read the guidelines for contributing before submitting this issue!

Please do not post installation, build, usage, or modeling questions, or other requests for help to Issues. Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe.

Issue 1 summary I0216 23:13:33.054150 27966 net.cpp:202] conv1_1_3x3_s2 does not need backward computation. I0216 23:13:33.054154 27966 net.cpp:202] input does not need backward computation. I0216 23:13:33.054157 27966 net.cpp:244] This network produces output depth_conv6_disp I0216 23:13:33.054411 27966 net.cpp:257] Network initialization done. I0216 23:13:33.179075 27966 net.cpp:746] Ignoring source layer ImageDispData I0216 23:13:33.219265 27966 net.cpp:746] Ignoring source layer depth_conv6_disp_depth_conv6_disp_0_split I0216 23:13:33.219300 27966 net.cpp:746] Ignoring source layer disp_label_gather I0216 23:13:33.219308 27966 net.cpp:746] Ignoring source layer disp_label_gather_disp_label_gather_0_split I0216 23:13:33.219316 27966 net.cpp:746] Ignoring source layer disp_data_gather I0216 23:13:33.219321 27966 net.cpp:746] Ignoring source layer disp_data_gather_disp_data_gather_0_split I0216 23:13:33.219327 27966 net.cpp:746] Ignoring source layer disp_loss I0216 23:13:33.219333 27966 net.cpp:746] Ignoring source layer disp_accuracy I0216 23:13:33.219339 27966 net.cpp:746] Ignoring source layer disp_delta I0216 23:13:33.219344 27966 net.cpp:746] Ignoring source layer disp_delta_gather I0216 23:13:33.219350 27966 net.cpp:746] Ignoring source layer smooth_loss Processing 000000_10.png F0216 23:13:33.786833 27966 syncedmem.cpp:71] Check failed: error == cudaSuccess (2 vs. 0) out of memory

System configuration

* Operating system: ubuntu16.04

* CUDA version : ８.0

* CUDNN version (if applicable): 5.1.10

* Python version (if using pycaffe): 2.7.1２

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1149 G /usr/lib/xorg/Xorg 200MiB | | 0 1617 G qtcreator 15MiB | | 0 2001 G compiz 115MiB | | 0 2224 G fcitx-qimpanel 8MiB | | 0 2357 G /usr/lib/firefox/firefox 2MiB | +-----------------------------------------------------------------------------+

Issue check： answer of others:＂The error you get is indeed out of memory, but it's not the RAM, but rather GPU memory (note the the error comes from CUDA). Usually, when caffe is out of memory - the first thing to do is reduce the batch size (at the cost of gradient accuracy), but since you are already at batch size = 1...＂

I don't know how to do.The answer is for the use of caffe to train a net. I just do as this: python models/get_disp.py --model_weights models/SegStereo_pre/SegStereo_pre_corr_kitti_ft.caffemodel --model_deploy models/SegStereo_pre/SegStereo_pre_corr_deploy_kitti.prototxt --data data/KITTI --result models/SegStereo_pre/result --gpu 0 --colorize --evaluate

Another problem appears when I check the caffe.

Issue 2 summary make runtest jr@jr-thunderobot:~/SegStereo-master/build$ make runtest [ 1%] Built target caffeproto [ 71%] Built target caffe [ 72%] Built target gtest [ 73%] Building CXX object src/caffe/test/CMakeFiles/test.testbin.dir/test_deconvolution_layer.cpp.o /home/jr/SegStereo-master/src/caffe/test/test_deconvolution_layer.cpp:8:47: fatal error: caffe/layers/cudnn_deconv_layer.hpp: 没有那个文件或目录 compilation terminated. src/caffe/test/CMakeFiles/test.testbin.dir/build.make:643: recipe for target 'src/caffe/test/CMakeFiles/test.testbin.dir/test_deconvolution_layer.cpp.o' failed make[3]: [src/caffe/test/CMakeFiles/test.testbin.dir/test_deconvolution_layer.cpp.o] Error 1 CMakeFiles/Makefile2:394: recipe for target 'src/caffe/test/CMakeFiles/test.testbin.dir/all' failed make[2]: [src/caffe/test/CMakeFiles/test.testbin.dir/all] Error 2 CMakeFiles/Makefile2:367: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/rule' failed make[1]: [src/caffe/test/CMakeFiles/runtest.dir/rule] Error 2 Makefile:253: recipe for target 'runtest' failed make: [runtest] Error 2

I want to know which one is the main problem? How to solve issue 2(no useful information through internet). And how to solve the issue 1(operation)

Thank you all for advance.

hnsywangxin commented 5 years ago

@jlyw1017 I have the same error ,have you solved it?

jianrui1 commented 5 years ago

@jlyw1017 @hnsywangxin @jlyw1017 Can you "make runtest"? when I make runtest, system appear: CXX src/caffe/test/test_stochastic_pooling.cpp CXX src/caffe/test/test_deconvolution_layer.cpp src/caffe/test/test_deconvolution_layer.cpp:8:47: fatal error: caffe/layers/cudnn_deconv_layer.hpp: 没有那个文件或目录 compilation terminated. Makefile:591: recipe for target '.build_release/src/caffe/test/test_deconvolution_layer.o' failed make: *** [.build_release/src/caffe/test/test_deconvolution_layer.o] Error 1

Do you know the reason? Can you share me your System configuration and help me out. Thank you.

hnsywangxin commented 5 years ago

@jianrui1　my runtest is ok my system configuration is : ubuntu 16.04 gtx 1080ti cuda-9 cudnn-7.1

jianrui1 commented 5 years ago

@hnsywangxin
Can you share some experience to me.vx:464374800.Best wishes!

jin-zhe commented 5 years ago

I also met the similar problem when using the new version of CUDA and CUDNN. If you use an older version, this error may be fixed. This error is from correlation layer. I will try to fix the bug after my vacation. Thank you.

Hi @yangguorun I'm running into the same problem. Can you kindly share what are the versions of CUDA and CUDNN that worked for you?

yangguorun / SegStereo