Model failed to predict the correct class

flyyufelix commented 7 years ago

I saw that you use a convolution layer instead of an innerproduct layer for "fc6". Though it looks weird I am okay with that.

So I added a softmax layer after fc6 to perform classification on my own testing image (it's a cat image). But the model failed to predict the correct class. It might be due to 2 reasons:

1) For image preprocessing, do you subtract the mean pixel and scaled by 0.017 as below?

im[:,:,0] = (im[:,:,0] - 103.94) / 0.017 im[:,:,1] = (im[:,:,1] - 116.78) / 0.017 im[:,:,2] = (im[:,:,2] - 123.68) / 0.017

2) Do you use the same classes.txt (class mapping file) as Inception and ResNet? (i.e. this one)

shicai commented 7 years ago

scale is a multiplication op, like this: im[:,:,0] = (im[:,:,0] - 103.94) * 0.017

shicai commented 7 years ago

if you are using caffe, you can do it like this:

  transform_param {
    scale: 0.017
    mirror: false
    crop_size: 224
    mean_value: [103.94,116.78,123.68]
  }

flyyufelix commented 7 years ago

It works now. Thanks!

JinmingZhao commented 6 years ago

Hi， @shicai @flyyufelix ， I have the same problem, the prediction result is not correct, I use the lables as： https://gist.github.com/shicai/fa9f98edc23521382955d4731636d1af and my test code as below, Thanks!

def test_prediction():
    model_path = 'DenseNet_161.caffemodel'
    proto_path = 'DenseNet_161.prototxt'
    img_path = sys.argv[1]

    net = initilize(prototext_path=proto_path, model_path=model_path, gpuId=0)

    int2label_path = 'caffe_image_labels'
    with open(int2label_path, 'r') as f:
        lines = f.readlines()
    int2label = [line.strip() for line in lines]

    # default is RGB and [0,1]
    img = caffe.io.load_image(img_path)  # H x W x 3
    nh, nw = resize_shape(img, min_size=256) #keey ratio
    # nh, nw = 224, 224

    # ref : https://github.com/flyyufelix/DenseNet-Keras/blob/master/test_inference.py
    import cv2
    img = cv2.resize(cv2.imread(img_path), (nw, nh)).astype(np.float32)
    img[:, :, 0] = (img[:, :, 0] - 103.94) * 0.017
    img[:, :, 1] = (img[:, :, 1] - 116.78) * 0.017
    img[:, :, 2] = (img[:, :, 2] - 123.68) * 0.017
    print img.shape
    img = img.transpose((2, 0, 1))
    transformed_img = img

    print(transformed_img, transformed_img.shape)
    net.blobs['data'].reshape(1, 3, nh, nw)
    net.blobs['data'].data[...] = transformed_img
    # print(net.blobs['data'].data[...])
    print('forward')
    net.forward()
    ft = net.blobs['conv5_blk/bn'].data  # relu5_blk
    print(ft.shape)

    prob = net.blobs['fc6'].data
    print prob.shape
    prob = np.reshape(prob, (1000,))
    prob = np.exp(prob) / np.sum(np.exp(prob))
    print prob[249], prob[251], np.max(prob)
    print(int2label[np.argmax(prob)])

shicai commented 6 years ago

first, add a softmax layer to the end of your deploy.prototxt:

layer {
  name: "prob"
  top: "prob"
  type: "Softmax"
  bottom: "fc6"
}

then, try the following scripts:

nh, nw = 224, 224
im = caffe.io.load_image(img_path)
im = caffe.io.resize_image(im, [nh, nw])

img_mean = np.array([103.939, 116.779, 123.68], dtype=np.float32)
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2, 0, 1))  # row to col
transformer.set_channel_swap('data', (2, 1, 0))  # RGB to BGR
transformer.set_raw_scale('data', 255)  # [0,1] to [0,255]
transformer.set_mean('data', img_mean)
transformer.set_input_scale('data', 0.017)

net.blobs['data'].reshape(1, 3, nh, nw)
net.blobs['data'].data[...] = transformer.preprocess('data', im)
out = net.forward()
prob = out['prob']
prob = np.squeeze(prob)

idx = np.argsort(-prob) 
print(idx[0:5])

JinmingZhao commented 6 years ago

Thanks for your reply, but the prediction is still wrong:

def test_prediction():
    model_path = 'DenseNet_161.caffemodel'
    proto_path = 'DenseNet_161.prototxt'
    img_path = sys.argv[1]

    net = initilize(prototext_path=proto_path, model_path=model_path, gpuId=0)

    int2label_path = 'caffe_image_labels'
    with open(int2label_path, 'r') as f:
        lines = f.readlines()
    int2label = [line.strip() for line in lines]
    int2label = np.asarray(int2label)

    nh, nw = 224, 224
    im = caffe.io.load_image(img_path)
    im = caffe.io.resize_image(im, [nh, nw])

    img_mean = np.array([103.939, 116.779, 123.68], dtype=np.float32)
    transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
    transformer.set_transpose('data', (2, 0, 1))  # row to col
    transformer.set_channel_swap('data', (2, 1, 0))  # RGB to BGR
    transformer.set_raw_scale('data', 255)  # [0,1] to [0,255]
    transformer.set_mean('data', img_mean)
    transformer.set_input_scale('data', 0.017)

    net.blobs['data'].reshape(1, 3, nh, nw)
    net.blobs['data'].data[...] = transformer.preprocess('data', im)
    out = net.forward()
    prob = out['prob']
    prob = np.squeeze(prob)

    idx = np.argsort(-prob)
    print(idx[0:5])
    print(int2label[idx[0:5]])

the model download from the 'baidu disk' the picture is 'cat.jpg' the lables as, https://gist.github.com/shicai/fa9f98edc23521382955d4731636d1af the prediction result is, [892 681 916 644 650] ["'n04548280 wall clock'" "'n03832673 notebook, notebook computer'" "'n06359193 web site, website, internet site, site'" "'n03729826 matchstick'" "'n03759954 microphone, mike'"]

Thanks, Jinming

shicai commented 6 years ago

maybe your model is corrupted, please download the model again.

JinmingZhao commented 6 years ago

Hi, @shicai , I test the prediction of DenseNet_121 model is ok, but I have redownload the DenseNet_161 that is still not work, could you please check the DenseNet_161 model on the "baidu disk" ?

shicai commented 6 years ago

it should be ok, since it has been tested by others on github. or you can try to download the model from Google Drive.

JinmingZhao commented 6 years ago

@shicai I have test model from the Google Drive, the result is same, and the results is still wrong. But, just modify 161 to 121, the result will be right. could you test the 161 model?

JinmingZhao commented 6 years ago

And I will download the 169 and 201 models for verify this problem.

JinmingZhao commented 6 years ago

@shicai
There are some test result:s with image shape 224 224: {121,169,201} model are OK, but 161 model (baidu disk and Google Driven) can not predict the cat.jpg and a dog picture (Whole body), but can predict a husky dog (only a head) , the top 5 result are ["'n02108915 French bulldog'" "'n02110185 Siberian husky'" "'n02808304 bath towel'" "'n02085620 Chihuahua'' "'n02097298 Scotch terrier, Scottish terrier, Scottie'"]. with image shape 256?, 256 is the smaller edge, process the image for keeping ratio. {121,169,201} model are OK, but 161 mode is not work.

So the 161 model has something wrong?

shicai commented 6 years ago

please check this: md5sum DenseNet_161.caffemodel = 26fee6531e67a7c239e10fa009ca2a57

shicai commented 6 years ago

I downloaded DenseNet161 from Google Drive. It works well, and the top 5 predicted labels are: [ 0 1 389 397 392]

JinmingZhao commented 6 years ago

26fee6531e67a7c239e10fa009ca2a57 DenseNet_161.caffemodel （Google Driven） 26fee6531e67a7c239e10fa009ca2a57 DenseNet_161.caffemodel.old （baidu disk）

shicai commented 6 years ago

no, i use a tench image n01440764_37.JPEG from ImageNet dataset.

shicai commented 6 years ago

when using cat.jpg in caffe examples, the result is [282 285 281 263 287]

JinmingZhao commented 6 years ago

Oh, I found the reason, I have modify the prototxt name: "DENSENET_161" input: "data" input_dim: 1 input_dim: 3 input_dim: 224 input_dim: 224 to name: "DENSENET_161" input: "data" input_dim: 1 input_dim: 3 input_dim: 1 input_dim: 1 .

Because I have seen a prototxt in openpose that is written in this way: input: "image" input_dim: 1 input_dim: 3 input_dim: 1 # This value will be defined at runtime input_dim: 1 # This value will be defined at runtime So, I think this can be OK. Would you please explain the reason? I haven' t use the caffe before, so I don't understand some details.

Thank you very much! Sorry, It's my mistake.

shicai commented 6 years ago

The four values of input_dim indicate N,C,H,W respectively. N=1 means a single image, C=3 means color image with RGB channels. The last two values are the height and width of input image. openpose uses a customized caffe, maybe some image processing steps have been changed.

shicai / DenseNet-Caffe

Model failed to predict the correct class #4