Closed flyyufelix closed 7 years ago
scale is a multiplication op, like this:
im[:,:,0] = (im[:,:,0] - 103.94) * 0.017
if you are using caffe, you can do it like this:
transform_param {
scale: 0.017
mirror: false
crop_size: 224
mean_value: [103.94,116.78,123.68]
}
It works now. Thanks!
Hi, @shicai @flyyufelix , I have the same problem, the prediction result is not correct, I use the lables as: https://gist.github.com/shicai/fa9f98edc23521382955d4731636d1af and my test code as below, Thanks!
def test_prediction():
model_path = 'DenseNet_161.caffemodel'
proto_path = 'DenseNet_161.prototxt'
img_path = sys.argv[1]
net = initilize(prototext_path=proto_path, model_path=model_path, gpuId=0)
int2label_path = 'caffe_image_labels'
with open(int2label_path, 'r') as f:
lines = f.readlines()
int2label = [line.strip() for line in lines]
# default is RGB and [0,1]
img = caffe.io.load_image(img_path) # H x W x 3
nh, nw = resize_shape(img, min_size=256) #keey ratio
# nh, nw = 224, 224
# ref : https://github.com/flyyufelix/DenseNet-Keras/blob/master/test_inference.py
import cv2
img = cv2.resize(cv2.imread(img_path), (nw, nh)).astype(np.float32)
img[:, :, 0] = (img[:, :, 0] - 103.94) * 0.017
img[:, :, 1] = (img[:, :, 1] - 116.78) * 0.017
img[:, :, 2] = (img[:, :, 2] - 123.68) * 0.017
print img.shape
img = img.transpose((2, 0, 1))
transformed_img = img
print(transformed_img, transformed_img.shape)
net.blobs['data'].reshape(1, 3, nh, nw)
net.blobs['data'].data[...] = transformed_img
# print(net.blobs['data'].data[...])
print('forward')
net.forward()
ft = net.blobs['conv5_blk/bn'].data # relu5_blk
print(ft.shape)
prob = net.blobs['fc6'].data
print prob.shape
prob = np.reshape(prob, (1000,))
prob = np.exp(prob) / np.sum(np.exp(prob))
print prob[249], prob[251], np.max(prob)
print(int2label[np.argmax(prob)])
first, add a softmax layer to the end of your deploy.prototxt
:
layer {
name: "prob"
top: "prob"
type: "Softmax"
bottom: "fc6"
}
then, try the following scripts:
nh, nw = 224, 224
im = caffe.io.load_image(img_path)
im = caffe.io.resize_image(im, [nh, nw])
img_mean = np.array([103.939, 116.779, 123.68], dtype=np.float32)
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2, 0, 1)) # row to col
transformer.set_channel_swap('data', (2, 1, 0)) # RGB to BGR
transformer.set_raw_scale('data', 255) # [0,1] to [0,255]
transformer.set_mean('data', img_mean)
transformer.set_input_scale('data', 0.017)
net.blobs['data'].reshape(1, 3, nh, nw)
net.blobs['data'].data[...] = transformer.preprocess('data', im)
out = net.forward()
prob = out['prob']
prob = np.squeeze(prob)
idx = np.argsort(-prob)
print(idx[0:5])
Thanks for your reply, but the prediction is still wrong:
def test_prediction():
model_path = 'DenseNet_161.caffemodel'
proto_path = 'DenseNet_161.prototxt'
img_path = sys.argv[1]
net = initilize(prototext_path=proto_path, model_path=model_path, gpuId=0)
int2label_path = 'caffe_image_labels'
with open(int2label_path, 'r') as f:
lines = f.readlines()
int2label = [line.strip() for line in lines]
int2label = np.asarray(int2label)
nh, nw = 224, 224
im = caffe.io.load_image(img_path)
im = caffe.io.resize_image(im, [nh, nw])
img_mean = np.array([103.939, 116.779, 123.68], dtype=np.float32)
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2, 0, 1)) # row to col
transformer.set_channel_swap('data', (2, 1, 0)) # RGB to BGR
transformer.set_raw_scale('data', 255) # [0,1] to [0,255]
transformer.set_mean('data', img_mean)
transformer.set_input_scale('data', 0.017)
net.blobs['data'].reshape(1, 3, nh, nw)
net.blobs['data'].data[...] = transformer.preprocess('data', im)
out = net.forward()
prob = out['prob']
prob = np.squeeze(prob)
idx = np.argsort(-prob)
print(idx[0:5])
print(int2label[idx[0:5]])
the model download from the 'baidu disk' the picture is 'cat.jpg' the lables as, https://gist.github.com/shicai/fa9f98edc23521382955d4731636d1af the prediction result is, [892 681 916 644 650] ["'n04548280 wall clock'" "'n03832673 notebook, notebook computer'" "'n06359193 web site, website, internet site, site'" "'n03729826 matchstick'" "'n03759954 microphone, mike'"]
Thanks, Jinming
maybe your model is corrupted, please download the model again.
Hi, @shicai , I test the prediction of DenseNet_121 model is ok, but I have redownload the DenseNet_161 that is still not work, could you please check the DenseNet_161 model on the "baidu disk" ?
it should be ok, since it has been tested by others on github. or you can try to download the model from Google Drive.
@shicai I have test model from the Google Drive, the result is same, and the results is still wrong. But, just modify 161 to 121, the result will be right. could you test the 161 model?
And I will download the 169 and 201 models for verify this problem.
@shicai
There are some test result:s
with image shape 224 224:
{121,169,201} model are OK, but 161 model (baidu disk and Google Driven) can not predict the cat.jpg and a dog picture (Whole body), but can predict a husky dog (only a head) , the top 5 result are ["'n02108915 French bulldog'" "'n02110185 Siberian husky'" "'n02808304 bath towel'" "'n02085620 Chihuahua'' "'n02097298 Scotch terrier, Scottish terrier, Scottie'"].
with image shape 256?, 256 is the smaller edge, process the image for keeping ratio.
{121,169,201} model are OK, but 161 mode is not work.
So the 161 model has something wrong?
please check this:
md5sum DenseNet_161.caffemodel = 26fee6531e67a7c239e10fa009ca2a57
I downloaded DenseNet161 from Google Drive.
It works well, and the top 5 predicted labels are: [ 0 1 389 397 392]
26fee6531e67a7c239e10fa009ca2a57 DenseNet_161.caffemodel (Google Driven) 26fee6531e67a7c239e10fa009ca2a57 DenseNet_161.caffemodel.old (baidu disk)
no, i use a tench image n01440764_37.JPEG
from ImageNet dataset.
when using cat.jpg
in caffe examples, the result is [282 285 281 263 287]
Oh, I found the reason, I have modify the prototxt name: "DENSENET_161" input: "data" input_dim: 1 input_dim: 3 input_dim: 224 input_dim: 224 to name: "DENSENET_161" input: "data" input_dim: 1 input_dim: 3 input_dim: 1 input_dim: 1 .
Because I have seen a prototxt in openpose that is written in this way: input: "image" input_dim: 1 input_dim: 3 input_dim: 1 # This value will be defined at runtime input_dim: 1 # This value will be defined at runtime So, I think this can be OK. Would you please explain the reason? I haven' t use the caffe before, so I don't understand some details.
Thank you very much! Sorry, It's my mistake.
The four values of input_dim
indicate N,C,H,W respectively.
N=1
means a single image, C=3
means color image with RGB channels.
The last two values are the height and width of input image.
openpose uses a customized caffe, maybe some image processing steps have been changed.
I saw that you use a convolution layer instead of an innerproduct layer for "fc6". Though it looks weird I am okay with that.
So I added a softmax layer after fc6 to perform classification on my own testing image (it's a cat image). But the model failed to predict the correct class. It might be due to 2 reasons:
1) For image preprocessing, do you subtract the mean pixel and scaled by 0.017 as below?
im[:,:,0] = (im[:,:,0] - 103.94) / 0.017 im[:,:,1] = (im[:,:,1] - 116.78) / 0.017 im[:,:,2] = (im[:,:,2] - 123.68) / 0.017
2) Do you use the same classes.txt (class mapping file) as Inception and ResNet? (i.e. this one)