ijkguo / mx-rcnn

Parallel Faster R-CNN implementation with MXNet.
Other
669 stars 292 forks source link

Check failed: error == cudaSuccess problem( proposal.cu ) #85

Closed jacky4323 closed 6 years ago

jacky4323 commented 6 years ago

Hi, I have some problems,Can anyone help me please? I want to use some specific API function in https://github.com/hpi-xnor/BMXNet and I follow below commands to build and compile mxnet from source,I also can run specific API function correctly. $ git clone --recursive https://github.com/hpi-xnor/mxnet.git # remember to include the --recursive $ mkdir build/Release && cd build/Release $ cmake ../../ $ make -j8 $ export LD_LIBRARY_PATH=/build/Release $ export PYTHONPATH=/python

Next,I want to combine with some detection model,so I follow some commands below,and I can run the demo.py correctly by CPU. However,if I use python demo.py --gpu 0,it met some problems below. Any help would be appericated! Thanks a lot!

Best Regards, PengWei

python demo.py --prefix final --epoch 0 --image myimage.jpg --gpu 0 --vis

Error Message: python demo.py --prefix final --epoch 0 --image myimage.jpg --gpu 2 --vis [20:24:19] /home/jacky4323/BMXNet_v1/mxnet/src/operator/././cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable) [20:24:32] /home/jacky4323/BMXNet_v1/mxnet/dmlc-core/include/dmlc/logging.h:308: [20:24:32] /home/jacky4323/BMXNet_v1/mxnet/src/operator/contrib/proposal.cu:495: Check failed: error == cudaSuccess (7 vs. 0) too many resources requested for launch

Stack trace returned 10 entries: [bt] (0) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fcda10eae9c] [bt] (1) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(ZN5mxnet2op13ProposalGPUOpIN7mshadow3gpuEE7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESD_SD+0x12b9) [0x7fcda3ee92c9] [bt] (2) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(ZN5mxnet2op13OperatorState7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS6_EERKS5_INS_9OpReqTypeESaISB_EESA+0x36d) [0x7fcda13564ed] [bt] (3) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN5mxnet4exec23StatefulComputeExecutor3RunENS_10RunContextEb+0x69) [0x7fcda125de69] [bt] (4) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(+0x992210) [0x7fcda1222210] [bt] (5) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x93) [0x7fcda1119a83] [bt] (6) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9GPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEbPNS1_17ThreadWorkerBlockIXT_EEESt10shared_ptrINS0_10ThreadPool11SimpleEventEE+0x10b) [0x7fcda112289b] [bt] (7) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x63) [0x7fcda1122ac3] [bt] (8) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7fcda111c22a] [bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fce1fc54c80]

[20:24:32] /home/jacky4323/BMXNet_v1/mxnet/dmlc-core/include/dmlc/logging.h:308: [20:24:32] /home/jacky4323/BMXNet_v1/mxnet/src/engine/./threaded_engine.h:359: [20:24:32] /home/jacky4323/BMXNet_v1/mxnet/src/operator/contrib/proposal.cu:495: Check failed: error == cudaSuccess (7 vs. 0) too many resources requested for launch

Stack trace returned 10 entries: [bt] (0) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fcda10eae9c] [bt] (1) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(ZN5mxnet2op13ProposalGPUOpIN7mshadow3gpuEE7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESD_SD+0x12b9) [0x7fcda3ee92c9] [bt] (2) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(ZN5mxnet2op13OperatorState7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS6_EERKS5_INS_9OpReqTypeESaISB_EESA+0x36d) [0x7fcda13564ed] [bt] (3) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN5mxnet4exec23StatefulComputeExecutor3RunENS_10RunContextEb+0x69) [0x7fcda125de69] [bt] (4) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(+0x992210) [0x7fcda1222210] [bt] (5) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x93) [0x7fcda1119a83] [bt] (6) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9GPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEbPNS1_17ThreadWorkerBlockIXT_EEESt10shared_ptrINS0_10ThreadPool11SimpleEventEE+0x10b) [0x7fcda112289b] [bt] (7) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x63) [0x7fcda1122ac3] [bt] (8) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7fcda111c22a] [bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fce1fc54c80]

A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 8 entries: [bt] (0) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fcda10eae9c] [bt] (1) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x36b) [0x7fcda1119d5b] [bt] (2) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9GPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEbPNS1_17ThreadWorkerBlockIXT_EEESt10shared_ptrINS0_10ThreadPool11SimpleEventEE+0x10b) [0x7fcda112289b] [bt] (3) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x63) [0x7fcda1122ac3] [bt] (4) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7fcda111c22a] [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fce1fc54c80] [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fce2d40a6ba] [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fce2d1403dd]

terminate called after throwing an instance of 'dmlc::Error' what(): [20:24:32] /home/jacky4323/BMXNet_v1/mxnet/src/engine/./threaded_engine.h:359: [20:24:32] /home/jacky4323/BMXNet_v1/mxnet/src/operator/contrib/proposal.cu:495: Check failed: error == cudaSuccess (7 vs. 0) too many resources requested for launch

Stack trace returned 10 entries: [bt] (0) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fcda10eae9c] [bt] (1) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(ZN5mxnet2op13ProposalGPUOpIN7mshadow3gpuEE7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESD_SD+0x12b9) [0x7fcda3ee92c9] [bt] (2) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(ZN5mxnet2op13OperatorState7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS6_EERKS5_INS_9OpReqTypeESaISB_EESA+0x36d) [0x7fcda13564ed] [bt] (3) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN5mxnet4exec23StatefulComputeExecutor3RunENS_10RunContextEb+0x69) [0x7fcda125de69] [bt] (4) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(+0x992210) [0x7fcda1222210] [bt] (5) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x93) [0x7fcda1119a83] [bt] (6) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9GPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEbPNS1_17ThreadWorkerBlockIXT_EEESt10shared_ptrINS0_10ThreadPool11SimpleEventEE+0x10b) [0x7fcda112289b] [bt] (7) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x63) [0x7fcda1122ac3] [bt] (8) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7fcda111c22a] [bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fce1fc54c80]

A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 8 entries: [bt] (0) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fcda10eae9c] [bt] (1) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x36b) [0x7fcda1119d5b] [bt] (2) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9GPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEbPNS1_17ThreadWorkerBlockIXT_EEESt10shared_ptrINS0_10ThreadPool11SimpleEventEE+0x10b) [0x7fcda112289b] [bt] (3) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x63) [0x7fcda1122ac3] [bt] (4) /home/jacky4323/BMXNet_v1/mxnet/python/mxnet/../../build/Release/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7fcda111c22a] [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fce1fc54c80] [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fce2d40a6ba] [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fce2d1403dd]

ijkguo commented 6 years ago

mxnet/src/operator/contrib/proposal.cu:495: Check failed: error == cudaSuccess (7 vs. 0) too many resources requested for launch may indicate your gpu does not have enough cuda cores.

Please try different kernel launch configuration around proposal.cu:495.

ijkguo commented 6 years ago

Closed due to inactivity.