Converted Resent18 from mxnet is predicting label incorrectly

prashant-puri commented 6 years ago

Ubuntu - 14.04

Python version: 2.7

Caffe : 1.0.0 (CPU)

MMDNN Path : http://data.dmlc.ml/mxnet/models/imagenet/resnet/18-layers/resnet-18-symbol.json http://data.dmlc.ml/mxnet/models/imagenet/resnet/18-layers/resnet-18-0000.params

I have successfully converted mxnet to caffe Model resnet-18.prototxt

layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param {
    shape {
      dim: 1
      dim: 3
      dim: 224
      dim: 224
    }
  }
}
layer {
  name: "bn_data"
  type: "BatchNorm"
  bottom: "data"
  top: "bn_data"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "bn_data_scale"
  type: "Scale"
  bottom: "bn_data"
  top: "bn_data"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "conv0"
  type: "Convolution"
  bottom: "bn_data"
  top: "conv0"
  convolution_param {
    num_output: 64
    bias_term: false
    kernel_size: 7
    group: 1
    stride: 2
    pad_h: 3
    pad_w: 3
  }
}
layer {
  name: "bn0"
  type: "BatchNorm"
  bottom: "conv0"
  top: "bn0"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "bn0_scale"
  type: "Scale"
  bottom: "bn0"
  top: "bn0"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu0"
  type: "ReLU"
  bottom: "bn0"
  top: "bn0"
}
layer {
  name: "pooling0"
  type: "Pooling"
  bottom: "bn0"
  top: "pooling0"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
    pad_h: 1
    pad_w: 1
  }
}
layer {
  name: "stage1_unit1_bn1"
  type: "BatchNorm"
  bottom: "pooling0"
  top: "stage1_unit1_bn1"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "stage1_unit1_bn1_scale"
  type: "Scale"
  bottom: "stage1_unit1_bn1"
  top: "stage1_unit1_bn1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "stage1_unit1_relu1"
  type: "ReLU"
  bottom: "stage1_unit1_bn1"
  top: "stage1_unit1_bn1"
}
layer {
  name: "stage1_unit1_conv1"
  type: "Convolution"
  bottom: "stage1_unit1_bn1"
  top: "stage1_unit1_conv1"
  convolution_param {
    num_output: 64
    bias_term: false
    kernel_size: 3
    group: 1
    stride: 1
    pad_h: 1
    pad_w: 1
  }
}
layer {
  name: "stage1_unit1_sc"
  type: "Convolution"
  bottom: "stage1_unit1_bn1"
  top: "stage1_unit1_sc"
  convolution_param {
    num_output: 64
    bias_term: false
    kernel_size: 1
    group: 1
    stride: 1
    pad_h: 0
    pad_w: 0
  }
}
layer {
  name: "stage1_unit1_bn2"
  type: "BatchNorm"
  bottom: "stage1_unit1_conv1"
  top: "stage1_unit1_bn2"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "stage1_unit1_bn2_scale"
  type: "Scale"
  bottom: "stage1_unit1_bn2"
  top: "stage1_unit1_bn2"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "stage1_unit1_relu2"
  type: "ReLU"
  bottom: "stage1_unit1_bn2"
  top: "stage1_unit1_bn2"
}
layer {
  name: "stage1_unit1_conv2"
  type: "Convolution"
  bottom: "stage1_unit1_bn2"
  top: "stage1_unit1_conv2"
  convolution_param {
    num_output: 64
    bias_term: false
    kernel_size: 3
    group: 1
    stride: 1
    pad_h: 1
    pad_w: 1
  }
}
layer {
  name: "plus0"
  type: "Eltwise"
  bottom: "stage1_unit1_conv2"
  bottom: "stage1_unit1_sc"
  top: "plus0"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "stage1_unit2_bn1"
  type: "BatchNorm"
  bottom: "plus0"
  top: "stage1_unit2_bn1"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "stage1_unit2_bn1_scale"
  type: "Scale"
  bottom: "stage1_unit2_bn1"
  top: "stage1_unit2_bn1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "stage1_unit2_relu1"
  type: "ReLU"
  bottom: "stage1_unit2_bn1"
  top: "stage1_unit2_bn1"
}
layer {
  name: "stage1_unit2_conv1"
  type: "Convolution"
  bottom: "stage1_unit2_bn1"
  top: "stage1_unit2_conv1"
  convolution_param {
    num_output: 64
    bias_term: false
    kernel_size: 3
    group: 1
    stride: 1
    pad_h: 1
    pad_w: 1
  }
}
layer {
  name: "stage1_unit2_bn2"
  type: "BatchNorm"
  bottom: "stage1_unit2_conv1"
  top: "stage1_unit2_bn2"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "stage1_unit2_bn2_scale"
  type: "Scale"
  bottom: "stage1_unit2_bn2"
  top: "stage1_unit2_bn2"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "stage1_unit2_relu2"
  type: "ReLU"
  bottom: "stage1_unit2_bn2"
  top: "stage1_unit2_bn2"
}
layer {
  name: "stage1_unit2_conv2"
  type: "Convolution"
  bottom: "stage1_unit2_bn2"
  top: "stage1_unit2_conv2"
  convolution_param {
    num_output: 64
    bias_term: false
    kernel_size: 3
    group: 1
    stride: 1
    pad_h: 1
    pad_w: 1
  }
}
layer {
  name: "plus1"
  type: "Eltwise"
  bottom: "stage1_unit2_conv2"
  bottom: "plus0"
  top: "plus1"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "stage2_unit1_bn1"
  type: "BatchNorm"
  bottom: "plus1"
  top: "stage2_unit1_bn1"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "stage2_unit1_bn1_scale"
  type: "Scale"
  bottom: "stage2_unit1_bn1"
  top: "stage2_unit1_bn1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "stage2_unit1_relu1"
  type: "ReLU"
  bottom: "stage2_unit1_bn1"
  top: "stage2_unit1_bn1"
}
layer {
  name: "stage2_unit1_conv1"
  type: "Convolution"
  bottom: "stage2_unit1_bn1"
  top: "stage2_unit1_conv1"
  convolution_param {
    num_output: 128
    bias_term: false
    kernel_size: 3
    group: 1
    stride: 2
    pad_h: 1
    pad_w: 1
  }
}
layer {
  name: "stage2_unit1_sc"
  type: "Convolution"
  bottom: "stage2_unit1_bn1"
  top: "stage2_unit1_sc"
  convolution_param {
    num_output: 128
    bias_term: false
    kernel_size: 1
    group: 1
    stride: 2
    pad_h: 0
    pad_w: 0
  }
}
layer {
  name: "stage2_unit1_bn2"
  type: "BatchNorm"
  bottom: "stage2_unit1_conv1"
  top: "stage2_unit1_bn2"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "stage2_unit1_bn2_scale"
  type: "Scale"
  bottom: "stage2_unit1_bn2"
  top: "stage2_unit1_bn2"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "stage2_unit1_relu2"
  type: "ReLU"
  bottom: "stage2_unit1_bn2"
  top: "stage2_unit1_bn2"
}
layer {
  name: "stage2_unit1_conv2"
  type: "Convolution"
  bottom: "stage2_unit1_bn2"
  top: "stage2_unit1_conv2"
  convolution_param {
    num_output: 128
    bias_term: false
    kernel_size: 3
    group: 1
    stride: 1
    pad_h: 1
    pad_w: 1
  }
}
layer {
  name: "plus2"
  type: "Eltwise"
  bottom: "stage2_unit1_conv2"
  bottom: "stage2_unit1_sc"
  top: "plus2"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "stage2_unit2_bn1"
  type: "BatchNorm"
  bottom: "plus2"
  top: "stage2_unit2_bn1"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "stage2_unit2_bn1_scale"
  type: "Scale"
  bottom: "stage2_unit2_bn1"
  top: "stage2_unit2_bn1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "stage2_unit2_relu1"
  type: "ReLU"
  bottom: "stage2_unit2_bn1"
  top: "stage2_unit2_bn1"
}
layer {
  name: "stage2_unit2_conv1"
  type: "Convolution"
  bottom: "stage2_unit2_bn1"
  top: "stage2_unit2_conv1"
  convolution_param {
    num_output: 128
    bias_term: false
    kernel_size: 3
    group: 1
    stride: 1
    pad_h: 1
    pad_w: 1
  }
}
layer {
  name: "stage2_unit2_bn2"
  type: "BatchNorm"
  bottom: "stage2_unit2_conv1"
  top: "stage2_unit2_bn2"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "stage2_unit2_bn2_scale"
  type: "Scale"
  bottom: "stage2_unit2_bn2"
  top: "stage2_unit2_bn2"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "stage2_unit2_relu2"
  type: "ReLU"
  bottom: "stage2_unit2_bn2"
  top: "stage2_unit2_bn2"
}
layer {
  name: "stage2_unit2_conv2"
  type: "Convolution"
  bottom: "stage2_unit2_bn2"
  top: "stage2_unit2_conv2"
  convolution_param {
    num_output: 128
    bias_term: false
    kernel_size: 3
    group: 1
    stride: 1
    pad_h: 1
    pad_w: 1
  }
}
layer {
  name: "plus3"
  type: "Eltwise"
  bottom: "stage2_unit2_conv2"
  bottom: "plus2"
  top: "plus3"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "stage3_unit1_bn1"
  type: "BatchNorm"
  bottom: "plus3"
  top: "stage3_unit1_bn1"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "stage3_unit1_bn1_scale"
  type: "Scale"
  bottom: "stage3_unit1_bn1"
  top: "stage3_unit1_bn1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "stage3_unit1_relu1"
  type: "ReLU"
  bottom: "stage3_unit1_bn1"
  top: "stage3_unit1_bn1"
}
layer {
  name: "stage3_unit1_conv1"
  type: "Convolution"
  bottom: "stage3_unit1_bn1"
  top: "stage3_unit1_conv1"
  convolution_param {
    num_output: 256
    bias_term: false
    kernel_size: 3
    group: 1
    stride: 2
    pad_h: 1
    pad_w: 1
  }
}
layer {
  name: "stage3_unit1_sc"
  type: "Convolution"
  bottom: "stage3_unit1_bn1"
  top: "stage3_unit1_sc"
  convolution_param {
    num_output: 256
    bias_term: false
    kernel_size: 1
    group: 1
    stride: 2
    pad_h: 0
    pad_w: 0
  }
}
layer {
  name: "stage3_unit1_bn2"
  type: "BatchNorm"
  bottom: "stage3_unit1_conv1"
  top: "stage3_unit1_bn2"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "stage3_unit1_bn2_scale"
  type: "Scale"
  bottom: "stage3_unit1_bn2"
  top: "stage3_unit1_bn2"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "stage3_unit1_relu2"
  type: "ReLU"
  bottom: "stage3_unit1_bn2"
  top: "stage3_unit1_bn2"
}
layer {
  name: "stage3_unit1_conv2"
  type: "Convolution"
  bottom: "stage3_unit1_bn2"
  top: "stage3_unit1_conv2"
  convolution_param {
    num_output: 256
    bias_term: false
    kernel_size: 3
    group: 1
    stride: 1
    pad_h: 1
    pad_w: 1
  }
}
layer {
  name: "plus4"
  type: "Eltwise"
  bottom: "stage3_unit1_conv2"
  bottom: "stage3_unit1_sc"
  top: "plus4"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "stage3_unit2_bn1"
  type: "BatchNorm"
  bottom: "plus4"
  top: "stage3_unit2_bn1"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "stage3_unit2_bn1_scale"
  type: "Scale"
  bottom: "stage3_unit2_bn1"
  top: "stage3_unit2_bn1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "stage3_unit2_relu1"
  type: "ReLU"
  bottom: "stage3_unit2_bn1"
  top: "stage3_unit2_bn1"
}
layer {
  name: "stage3_unit2_conv1"
  type: "Convolution"
  bottom: "stage3_unit2_bn1"
  top: "stage3_unit2_conv1"
  convolution_param {
    num_output: 256
    bias_term: false
    kernel_size: 3
    group: 1
    stride: 1
    pad_h: 1
    pad_w: 1
  }
}
layer {
  name: "stage3_unit2_bn2"
  type: "BatchNorm"
  bottom: "stage3_unit2_conv1"
  top: "stage3_unit2_bn2"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "stage3_unit2_bn2_scale"
  type: "Scale"
  bottom: "stage3_unit2_bn2"
  top: "stage3_unit2_bn2"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "stage3_unit2_relu2"
  type: "ReLU"
  bottom: "stage3_unit2_bn2"
  top: "stage3_unit2_bn2"
}
layer {
  name: "stage3_unit2_conv2"
  type: "Convolution"
  bottom: "stage3_unit2_bn2"
  top: "stage3_unit2_conv2"
  convolution_param {
    num_output: 256
    bias_term: false
    kernel_size: 3
    group: 1
    stride: 1
    pad_h: 1
    pad_w: 1
  }
}
layer {
  name: "plus5"
  type: "Eltwise"
  bottom: "stage3_unit2_conv2"
  bottom: "plus4"
  top: "plus5"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "stage4_unit1_bn1"
  type: "BatchNorm"
  bottom: "plus5"
  top: "stage4_unit1_bn1"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "stage4_unit1_bn1_scale"
  type: "Scale"
  bottom: "stage4_unit1_bn1"
  top: "stage4_unit1_bn1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "stage4_unit1_relu1"
  type: "ReLU"
  bottom: "stage4_unit1_bn1"
  top: "stage4_unit1_bn1"
}
layer {
  name: "stage4_unit1_conv1"
  type: "Convolution"
  bottom: "stage4_unit1_bn1"
  top: "stage4_unit1_conv1"
  convolution_param {
    num_output: 512
    bias_term: false
    kernel_size: 3
    group: 1
    stride: 2
    pad_h: 1
    pad_w: 1
  }
}
layer {
  name: "stage4_unit1_sc"
  type: "Convolution"
  bottom: "stage4_unit1_bn1"
  top: "stage4_unit1_sc"
  convolution_param {
    num_output: 512
    bias_term: false
    kernel_size: 1
    group: 1
    stride: 2
    pad_h: 0
    pad_w: 0
  }
}
layer {
  name: "stage4_unit1_bn2"
  type: "BatchNorm"
  bottom: "stage4_unit1_conv1"
  top: "stage4_unit1_bn2"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "stage4_unit1_bn2_scale"
  type: "Scale"
  bottom: "stage4_unit1_bn2"
  top: "stage4_unit1_bn2"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "stage4_unit1_relu2"
  type: "ReLU"
  bottom: "stage4_unit1_bn2"
  top: "stage4_unit1_bn2"
}
layer {
  name: "stage4_unit1_conv2"
  type: "Convolution"
  bottom: "stage4_unit1_bn2"
  top: "stage4_unit1_conv2"
  convolution_param {
    num_output: 512
    bias_term: false
    kernel_size: 3
    group: 1
    stride: 1
    pad_h: 1
    pad_w: 1
  }
}
layer {
  name: "plus6"
  type: "Eltwise"
  bottom: "stage4_unit1_conv2"
  bottom: "stage4_unit1_sc"
  top: "plus6"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "stage4_unit2_bn1"
  type: "BatchNorm"
  bottom: "plus6"
  top: "stage4_unit2_bn1"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "stage4_unit2_bn1_scale"
  type: "Scale"
  bottom: "stage4_unit2_bn1"
  top: "stage4_unit2_bn1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "stage4_unit2_relu1"
  type: "ReLU"
  bottom: "stage4_unit2_bn1"
  top: "stage4_unit2_bn1"
}
layer {
  name: "stage4_unit2_conv1"
  type: "Convolution"
  bottom: "stage4_unit2_bn1"
  top: "stage4_unit2_conv1"
  convolution_param {
    num_output: 512
    bias_term: false
    kernel_size: 3
    group: 1
    stride: 1
    pad_h: 1
    pad_w: 1
  }
}
layer {
  name: "stage4_unit2_bn2"
  type: "BatchNorm"
  bottom: "stage4_unit2_conv1"
  top: "stage4_unit2_bn2"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "stage4_unit2_bn2_scale"
  type: "Scale"
  bottom: "stage4_unit2_bn2"
  top: "stage4_unit2_bn2"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "stage4_unit2_relu2"
  type: "ReLU"
  bottom: "stage4_unit2_bn2"
  top: "stage4_unit2_bn2"
}
layer {
  name: "stage4_unit2_conv2"
  type: "Convolution"
  bottom: "stage4_unit2_bn2"
  top: "stage4_unit2_conv2"
  convolution_param {
    num_output: 512
    bias_term: false
    kernel_size: 3
    group: 1
    stride: 1
    pad_h: 1
    pad_w: 1
  }
}
layer {
  name: "plus7"
  type: "Eltwise"
  bottom: "stage4_unit2_conv2"
  bottom: "plus6"
  top: "plus7"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "plus7"
  top: "bn1"
  batch_norm_param {
    use_global_stats: true
    eps: 1.99999994948e-05
  }
}
layer {
  name: "bn1_scale"
  type: "Scale"
  bottom: "bn1"
  top: "bn1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "bn1"
  top: "bn1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "bn1"
  top: "pool1"
  pooling_param {
    pool: AVE
    stride: 1
    global_pooling: true
  }
}
layer {
  name: "fc1"
  type: "InnerProduct"
  bottom: "pool1"
  top: "fc1"
  inner_product_param {
    num_output: 1000
    bias_term: true
  }
}
layer {
  name: "softmax"
  type: "Softmax"
  bottom: "fc1"
  top: "softmax"
}

Now I tried to predict the image using below script

import caffe
prototxt_filename='path_to_resnet18/resnet18.prototxt'
caffemodel_filename='path_to_resnet18/resnet18.caffemodel'

net = caffe.Net(prototxt_filename,
                caffemodel_filename,
                caffe.TEST)

# load input and configure preprocessing
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))
transformer.set_mean('data', np.array([104, 117, 123]))

transformer.set_raw_scale('data', 255.0)
transformer.set_channel_swap('data', (2,1,0))
net.blobs['data'].reshape(1,3,224,224)

image_name = 'http://farm4.static.flickr.com/3170/2533026039_a4d72913ec.jpg'

im = caffe.io.load_image(image_name)
net.blobs['data'].data[...] = transformer.preprocess('data', im)

out = net.forward()

output_prob = out['softmax'][0]

#print predicted labels
labels = np.loadtxt(base_dir + "data/ilsvrc12/synset_words.txt", str, delimiter='\t')
top_k = net.blobs['softmax'].data[0].flatten().argsort()[::-1][:5]

for prob, class_ in zip(output_prob[top_k], labels[top_k]):
    print("prob={} class={}".format(prob, class_))

Output:

prob=1.0 class=n03250847 drumstick prob=0.0 class=n15075141 toilet tissue, toilet paper, bathroom tissue prob=0.0 class=n02317335 starfish, sea star prob=0.0 class=n02389026 sorrel prob=0.0 class=n02364673 guinea pig, Cavia cobaya

Here I am getting drumstick label with probabiliyt 1.0.
Please Help

kitstar commented 6 years ago

Hi @prashant-puri , the MXNet resnet model doesn't apply any preprocess method. You can remove the

transformer.set_mean('data', np.array([104, 117, 123]))
transformer.set_raw_scale('data', 255.0)
transformer.set_channel_swap('data', (2,1,0))     # Not sure about this one

and see if it works.

prashant-puri commented 6 years ago

Hi @kitstar really appreciate your answer. I comment all three line as suggested. I works but result are still bad, Please check below result after removing code you suggested. input image is same as before image_name = 'http://farm4.static.flickr.com/3170/2533026039_a4d72913ec.jpg'


prob=0.999646782875 class=n04286575 spotlight, spot
prob=0.000113813846838 class=n04515003 upright, upright piano
prob=7.75204971433e-05 class=n03759954 microphone, mike
prob=5.58655337954e-05 class=n03666591 lighter, light, igniter, ignitor
prob=4.96878346894e-05 class=n03483316 hand blower, blow dryer, blow drier, hair dryer, hair drier

kitstar commented 6 years ago

Hi @prashant-puri . Found the problem. Will fix soon and ping you asap. Thanks!

prashant-puri commented 6 years ago

@kitstar Thanks..Wating for your reply :)

kitstar commented 6 years ago

Hi @prashant-puri . Fixed. Please try the newest code again.

microsoft / MMdnn

Converted Resent18 from mxnet is predicting label incorrectly #152