msracver / Deformable-ConvNets

Deformable Convolutional Networks
MIT License
4.05k stars 957 forks source link

Error when Train My Own DataSets #13

Closed firestonelib closed 7 years ago

firestonelib commented 7 years ago

Hi@Orpine, I've read the Deformable ConvNets paper, it's amazing! Now, I have a face dataset to train, so I change the pascal_voc.py and config.py from 21 classes to 2 classes.I run this : python ./experiments/rfcn/rfcn_end2end_train_test.py --cfg ./experiments/rfcn/cfgs/resnet_v1_101_voc0712_rfcn_dcn_end2end_ohem.yaml

but it errors:

[14:53:44] /mnt/data1/daniel/mxnet0/dmlc-core/include/dmlc/./logging.h:304[14:53:44] /mnt/data1/daniel/mxnet0/dmlc-core/include/dmlc/./logging.h:304: : [14:53:44] /mnt/data1/daniel/mxnet0/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: invalid device ordinal

Stack trace returned 6 entries:
[bt] (0) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7fe9694c9ac9]
[bt] (1) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN7mshadow9SetDeviceINS_3gpuEEEvi+0xb8) [0x7fe96a166c18]
[bt] (2) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x20) [0x7fe96a169460]
[bt] (3) /lib64/libstdc++.so.6(+0xb5220) [0x7fe98ce1d220]
[bt] (4) /lib64/libpthread.so.0(+0x7dc5) [0x7fe995a29dc5]
[bt] (5) /lib64/libc.so.6(clone+0x6d) [0x7fe99504f73d]

[14:53:44] /mnt/data1/daniel/mxnet0/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: invalid device ordinal

Stack trace returned 6 entries:
[bt] (0) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7fe9694c9ac9]
[bt] (1) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN7mshadow9SetDeviceINS_3gpuEEEvi+0xb8) [0x7fe96a166c18]
[bt] (2) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x20) [0x7fe96a169460]
[bt] (3) /lib64/libstdc++.so.6(+0xb5220) [0x7fe98ce1d220]
[bt] (4) /lib64/libpthread.so.0(+0x7dc5) [0x7fe995a29dc5]
[bt] (5) /lib64/libc.so.6(clone+0x6d) [0x7fe99504f73d]

terminate called after throwing an instance of 'dmlc::Error'
  what():  [14:53:44] /mnt/data1/daniel/mxnet0/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: invalid device ordinal

Stack trace returned 6 entries:
[bt] (0) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7fe9694c9ac9]
[bt] (1) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN7mshadow9SetDeviceINS_3gpuEEEvi+0xb8) [0x7fe96a166c18]
[bt] (2) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x20) [0x7fe96a169460]
[bt] (3) /lib64/libstdc++.so.6(+0xb5220) [0x7fe98ce1d220]
[bt] (4) /lib64/libpthread.so.0(+0x7dc5) [0x7fe995a29dc5]
[bt] (5) /lib64/libc.so.6(clone+0x6d) [0x7fe99504f73d]

terminate called recursively
Segmentation fault(core dumped)

I wanna why this happened, and how to solve this?

YuwenXiong commented 7 years ago

It said invalid device ordinal, please make sure you use correct GPU ID in yaml config file.

firestonelib commented 7 years ago

Thank you @Orpine , that works well for me.