msracver / Deformable-ConvNets

Deformable Convolutional Networks
MIT License
4.05k stars 957 forks source link

The training was suspended in the middle but did not end, the GPU-Util become 0%. #7

Open realwill opened 7 years ago

realwill commented 7 years ago

Thank you for sharing the wonderful work, it is really help. I encounter problems when training. The training suspend in epoch 0 batch 3300 (or others) and gpu-util become 0%, why?

baiyancheng20 commented 7 years ago

How did you install mxnet? I installed it according to http://mxnet.io/, but the code didn't work.

realwill commented 7 years ago

@baiyancheng20 yes, I installed the mxnet as http://mxnet.io

realwill commented 7 years ago

I debug the training, and locate the error as https://github.com/dmlc/mxnet/issues/3724

HaozhiQi commented 7 years ago

@realwill So did you fix the bug following that?

realwill commented 7 years ago

@Oh233 no.....

realwill commented 7 years ago

@Oh233 the same problem with training fcis

realwill commented 7 years ago

@Oh233 and the same after I use another mxnet: https://github.com/dmlc/mxnet/tree/62ecb60

mursalal commented 7 years ago

Hello, I have same error after Epoch[0] Batch [800] (for the second try: Batch [700]):

Error in CustomOp.forward: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.5-py2.7.egg/mxnet/operator.py", line 758, in forward_entry aux=tensors[4]) File "experiments/rfcn/../../rfcn/operator_py/proposal.py", line 137, in forward keep = nms(det) File "experiments/rfcn/../../rfcn/../lib/nms/nms.py", line 20, in _nms return gpu_nms(dets, thresh, device_id) File "gpu_nms.pyx", line 29, in gpu_nms.gpu_nms (gpu_nms.cpp:1914) IndexError: Out of bounds on buffer access (axis 0)

chengshuai commented 7 years ago

hi @realwill @baiyancheng20

I have compile the mxnet and copy the folder mxnet/pyhton to external/mxnet,,then run the command python ./rfcn/demo.py ,the error is File "./rfcn/demo.py", line 28, in from core.tester import im_detect, Predictor File "/home/chengshuai/test_sample/Deformable-ConvNets-master/rfcn/core/tester.py", line 15, in from module import MutableModule File "/home/chengshuai/test_sample/Deformable-ConvNets-master/rfcn/core/module.py", line 19, in from mxnet.initializer import Uniform, InitDesc ImportError: cannot import name InitDesc I do not know the why the error hanppen and could you tell me the deformal convoluthon network process.

Thank you!

maozezhong commented 6 years ago

@mursalal,hello, Have you solved the problem?

lclalalalala commented 6 years ago

same problem File "gpu_nms.pyx", line 29, in gpu_nms.gpu_nms IndexError: Out of bounds on buffer access (axis 0)

jjprincess commented 5 years ago

Does anybody solve this problem?