Closed ghost closed 7 years ago
There is only one _minus operation in get_vgg_rcnn. What changes did you do?
@precedenceguo I only changed the class labels in pascal_voc.py to fit my datasets and set config.TEST.CXX_PROPOSAL = False, for 'Symbol' doesn't have the 'Proposal' attribute. Here is the get_vgg_rcnn code I use:
data = mx.symbol.Variable(name="data")
rois = mx.symbol.Variable(name='rois')
label = mx.symbol.Variable(name='label')
bbox_target = mx.symbol.Variable(name='bbox_target')
bbox_weight = mx.symbol.Variable(name='bbox_weight')
# reshape input
rois = mx.symbol.Reshape(data=rois, shape=(-1, 5), name='rois_reshape')
label = mx.symbol.Reshape(data=label, shape=(-1, ), name='label_reshape')
bbox_target = mx.symbol.Reshape(data=bbox_target, shape=(-1, 4 * num_classes), name='bbox_target_reshape')
bbox_weight = mx.symbol.Reshape(data=bbox_weight, shape=(-1, 4 * num_classes), name='bbox_weight_reshape')
# shared convolutional layers
relu5_3 = get_vgg_conv(data)
# Fast R-CNN
pool5 = mx.symbol.ROIPooling(
name='roi_pool5', data=relu5_3, rois=rois, pooled_size=(7, 7), spatial_scale=1.0 / config.RCNN_FEAT_SRTIDE)
# group 6
flatten = mx.symbol.Flatten(data=pool5, name="flatten")
fc6 = mx.symbol.FullyConnected(data=flatten, num_hidden=4096, name="fc6")
relu6 = mx.symbol.Activation(data=fc6, act_type="relu", name="relu6")
drop6 = mx.symbol.Dropout(data=relu6, p=0.5, name="drop6")
# group 7
fc7 = mx.symbol.FullyConnected(data=drop6, num_hidden=4096, name="fc7")
relu7 = mx.symbol.Activation(data=fc7, act_type="relu", name="relu7")
drop7 = mx.symbol.Dropout(data=relu7, p=0.5, name="drop7")
# classification
cls_score = mx.symbol.FullyConnected(name='cls_score', data=drop7, num_hidden=num_classes)
cls_prob = mx.symbol.SoftmaxOutput(name='cls_prob', data=cls_score, label=label, normalization='batch')
# bounding box regression
bbox_pred = mx.symbol.FullyConnected(name='bbox_pred', data=drop7, num_hidden=num_classes * 4)
bbox_loss_ = bbox_weight * mx.symbol.smooth_l1(name='bbox_loss_', scalar=1.0, data=(bbox_pred - bbox_target))
bbox_loss = mx.sym.MakeLoss(name='bbox_loss', data=bbox_loss_, grad_scale=1.0 / config.TRAIN.BATCH_ROIS)
# reshape output
cls_prob = mx.symbol.Reshape(data=cls_prob, shape=(config.TRAIN.BATCH_IMAGES, -1, num_classes), name='cls_prob_reshape')
bbox_loss = mx.symbol.Reshape(data=bbox_loss, shape=(config.TRAIN.BATCH_IMAGES, -1, 4 * num_classes), name='bbox_loss_reshape')
# group output
group = mx.symbol.Group([cls_prob, bbox_loss])
return group
We can see that the only _minus
is bbox_pred - bbox_target
.
Use sym.tojson()
to check where is the _minus2
@precedenceguo Yes, it indeed has only one 'minus', but it's name is _minus2. Part of the .json file is like below:
{ "op": "null", "param": {}, "name": "bbox_target", "inputs": [], "backward_source_id": -1 }, { "op": "Reshape", "param": { "keep_highest": "False", "reverse": "False", "shape": "(-1,84)", "target_shape": "(0,0)" }, "name": "bbox_target_reshape", "inputs": [[83, 0]], "backward_source_id": -1 }, { "op": "Minus", "param": {}, "name": "minus2", "inputs": [[82, 0], [84, 0]], "backward_source_id": -1 }, { "op": "smooth_l1", "param": {"scalar": "1"}, "name": "bboxloss", "inputs": [[85, 0]], "backward_source_id": -1 }, { "op": "_Mul", "param": {}, "name": "_mul2", "inputs": [[79, 0], [86, 0]], "backward_source_id": -1 }, { "op": "MakeLoss", "param": { "grad_scale": "0.0078125", "normalization": "null", "valid_thresh": "0" }, "name": "bbox_loss", "inputs": [[87, 0]], "backward_source_id": -1 },
OK, so mxnet.base.MXNetError: InferShape Error in _minus2's rhs argument Shape inconsistent, Provided=(1536,12), inferred shape=(256,12) means that lhs is shaped (1536,12) and rhs is shaped (256,12). lhs is bbox_pred and rhs is bbox_target.
Would you please check their shape again, which refer to the bbox_pred fc and the bbox_target io?
Why not checkout coco as a different dataset?
@precedenceguo Thank you! I will checkout coco to give a try.
I'm trying to use multiple gpus, which is 4, to train mx-rcnn. The code I use is train_alternate.py. When it runs to the step of training rcnn, it gives me such error message. Could anyone help me with this? Thanks!