zhreshold / mxnet-ssd

MXNet port of SSD: Single Shot MultiBox Object Detector. Reimplementation of https://github.com/weiliu89/caffe/tree/ssd
MIT License
764 stars 337 forks source link

Error training inceptionv3 #226

Open ssdevel opened 5 years ago

ssdevel commented 5 years ago

Hi, i want to train the inceptionv3 network. I use the following command:

python train.py --network inceptionv3 --prefix final\inception\new\ssd --finetune 1 --end-epoch 400 --num-class 1 --class-names billboard --data-shape 512 --num-example 2340 --batch-size 4 --train-path records\final_30_train.rec --val-path records\final_30_val.rec

also i tried with adding the pretrained parameter. I renamed the files but i got this error:

Traceback (most recent call last): File "train.py", line 149, in tensorboard=args.tensorboard) File "C:\Users\stefa\Desktop\mxnet-ssd-master\train\train_net.py", line 256, in train_net exe = net.simple_bind(mx.cpu(), data=(1, 3, data_shape[0], data_shape[1]), label=(1, 1, 5), grad_req='null') File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\symbol\symbol.py", line 1519, in simple_bind raise RuntimeError(error_msg) RuntimeError: simple_bind error. Arguments: data: (1, 3, 3, 512) label: (1, 1, 5) Error in operator conv_1_conv2d: [14:17:19] c:\jenkins\workspace\mxnet-tag\mxnet\src\operator\nn\convolution.cc:191: Check failed: dilated_ksizey <= AddPad(dshape[2], param.pad[0]) (3 vs. 1) kernel size exceed input

When i run this command: python train.py --network inceptionv3 --prefix final\inception\ssd_inceptionv3_512 --begin-epoch 215 --end-epoch 400 --num-class 1 --class-names billboard --data-shape 512 --num-example 2340 --batch-size 4 --train-path records\final_30_train.rec --val-path records\final_30_val.rec --pretrained final\inception\ssd_inceptionv3_512

I got this error: Traceback (most recent call last): File "train.py", line 149, in tensorboard=args.tensorboard) File "C:\Users\stefa\Desktop\mxnet-ssd-master\train\train_net.py", line 355, in train_net monitor=monitor) File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\module\base_module.py", line 488, in fit allow_missing=allow_missing, force_init=force_init) File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\module\module.py", line 309, in init_params _impl(desc, arr, arg_params) File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\module\module.py", line 297, in _impl cache_arr.copyto(arr) File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\ndarray\ndarray.py", line 1970, in copyto return _internal._copyto(self, out=other) File "", line 25, in _copyto File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet_ctypes\ndarray.py", line 92, in _imperative_invoke ctypes.byref(out_stypes))) File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\base.py", line 149, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [14:27:05] c:\jenkins\workspace\mxnet-tag\mxnet\src\operator\elemwise_op_common.h:123: Check failed: assign(&dattr, (*vec)[i]) Incompatible attr in node at 0-th output: expected [126], got [12]

When i use this comand:

python train.py --network inceptionv3 --prefix final\inception\ssd_inceptionv3_512 --begin-epoch 215 --end-epoch 400 --num-class 1 --class-names billboard --data-shape 512 --num-example 2340 --batch-size 4 --train-path records\final_30_train.rec --val-path records\final_30_val.rec

the model does not converge.

Can you help me?

Thanks

ssdevel commented 5 years ago

I solved this issue

NewbYang commented 5 years ago

I solved this issue

can you help me to solve the same problem??

liuzhenhui commented 5 years ago

can you tell me the reason? I got the same wrong .like this :

RuntimeError: simple_bind error. Arguments: label: (40, 11, 6) data: (40, 3, 160, 48) Error in operator broadcast_mul0: [13:59:59] src/operator/tensor/./elemwise_binary_broadcast_op.h:68: Check failed: l == 1 || r == 1 operands could not be broadcast together with shapes [40,64,20,6] [40,64,21,7]

thank you ~ my email: 1061441313@qq.com

ssdevel commented 5 years ago

I used this command and was working: python train.py --network inceptionv3 --prefix --begin-epoch 1 --end-epoch 200 --num-example 1 --class-names --data-shape 512 --num-example --batch-size --train-path --val-path