Closed Kongsea closed 6 years ago
Hi, I can see that you are using mxnet 1.0.0. In our readme, however, it is suggested to use the official 1.1.0 version, or the newest 1.2.0(no guarantees). If using a different version of mxnet causes too much trouble for you, I suggest that you do not use NaiveEngine, as we have never tested on this EngineType before. Maybe you can try to use the default MXNET_ENGINE_TYPE: ThreadedEnginePerDevice.
After changing to the default MXNET_ENGINE_TYPE
, or upgrading to mxnet 1.2.0, it works well now.
Thank you.
After upgrading to mxnet 1.2.0, this error appears sometimes again when setting MXNET_ENGINE_TYPE
to NaiveEngine
.
Besides, it raised the following error occasionally:
mxnet.base.MXNetError: [12:01:59] src/operator/nn/./cudnn/cudnn_softmax_activation-inl.h:154: Check failed: e == CUDNN_STATUS_SUCCESS (3 vs. 0) cuDNN: CUDNN_STATUS_BAD_PARAM
But it sometimes work well again if I rerun the program. It seems very weird. Could you give me some help? Thank you. @chengdazhi
Hi, it has also been found that mxnet 1.2.0 installed by pip has this error. I suggest you build it from source or install 1.1.0.
Again, I don't see why you need NaiveEngine, we have never tested our code on NaiveEngine bofore.
I have downgraded to mxnet 1.1.0 and deleted NaiveEngine, now it works well. Thank you.
After training for several epoches, it raised the following error:
I trained the network on just one GPU using
CUDA_VISIBLE_DEVICES=0
although I have two GPUs. Please give me some advice or help to fix it. Thank you.