Open opraveen opened 8 years ago
seems like you used different shape for input rather than 224 as required for inception BN. VGG and inception BN use different input shapes as mentioned in the kaggle forum posts, so please not re-use VGG input to inception BN model.
This is the problem: mxnet #2585, pls check https://github.com/dmlc/mxnet/pull/2585
@lbin you are right, one needs to add pad=(1, 1)
like
pool = mx.symbol.Pooling(data=data, kernel=(3, 3), stride=(2, 2), pad=(1, 1), pool_type='max', attr=mirror_attr)
while VGG has no problems
I created the train and val .rec files, and when I run the training script with Inception-BN model, I notice this incorrect shape error:
$ ./run.cv_inception_bn.sh 2016-07-17 01:02:57,342 Node[0] start with arguments Namespace(batch_size=32, clip_gradient=5.0, data_dir='./', data_shape=224, dataset='ft', finetune_from='model/Inception_BN-0039', finetune_lr_scale=10, gpus='0', kv_store='local', load_epoch=None, log_dir='./tmp/', log_file=None, lr=0.001, lr_factor=1, lr_factor_epoch=1, model_prefix='./model/ckpt-shuffle1', network='inception-bn', num_classes=10, num_epochs=30, num_examples=216, train_dataset='sf1_train.rec', val_dataset='sf1_val.rec') 2016-07-17 01:02:57,342 Node[0] finetune from model/Inception_BN at epoch 39 [01:02:57] src/io/iter_image_recordio.cc:211: ImageRecordIOParser: ./sf1_train.rec, use 1 threads for decoding.. [01:02:57] src/io/./iter_normalize.h:218: Cannot find mean.bin: create mean image, this will take some time... [01:03:11] src/io/./iter_normalize.h:231: 10000 images processed, 13.6055 sec elapsed [01:03:21] src/io/./iter_normalize.h:231: 20000 images processed, 23.6031 sec elapsed [01:03:21] src/io/./iter_normalize.h:244: Save mean image to mean.bin.. [01:03:22] src/io/iter_image_recordio.cc:211: ImageRecordIOParser: ./sf1_val.rec, use 1 threads for decoding.. [01:03:22] src/io/./iter_normalize.h:103: Load mean image from mean.bin 2016-07-17 01:03:24,226 Node[0] lr_scale: {'fc1_ft_weight': 10, 'softmax_label': 10, 'fc1_ft_bias': 10} [01:03:24] ../mxnet/dmlc-core/include/dmlc/logging.h:235: [01:03:24] src/operator/./concat-inl.h:152: Check failed: (dshape[j]) == (tmp[j]) Incorrect shape[2]: (32,320,13,13). (first input shape: (32,576,14,14)) Traceback (most recent call last): File "train_inception_bn.py", line 92, in
train_model.fit(args, net, get_iterator)
File " ../kaggle_statefarm/inception/train_model.py", line 119, in fit
epoch_end_callback = checkpoint)
File "../mxnet/python/mxnet/model.py", line 746, in fit
self._init_params(dict(data.provide_data+data.provide_label))
File "../mxnet/python/mxnet/model.py", line 486, in _init_params
argshapes, , aux_shapes = self.symbol.infer_shape(_input_shapes)
File "../mxnet/python/mxnet/symbol.py", line 453, in infer_shape
return self._infer_shape_impl(False, args, *_kwargs)
File "../mxnet/python/mxnet/symbol.py", line 513, in _infer_shape_impl
ctypes.byref(complete)))
File "../mxnet/python/mxnet/base.py", line 77, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))