Closed troy0215 closed 4 years ago
Hi, did you normalise your input image by subtracting the mean ? The mean is written in the senet50_128.py.
Hi, i tried normalise the inputs as you say just now, but it didn't work. I also tried senet50_256
Do you mind trying the images given in the samples/tight_crop.
If this doesn't work, then there must be something wrong.
still not work. first the extended bounding box of the face is resized so that the shorter side is 256 pixels; then the centre 224 224 crop of the face image is used as input to the network. this is part in your paper, and i directly resize the shorter side of pictures in tight_crop to 256 and then extract center 224224 as inputs
OK, I have tested the model, it works perfectly fine. I test on the images in the samples/tight_crop Here is the code:
# Code:
from __future__ import absolute_import
from __future__ import print_function
import os
import sys
import pdb
import PIL
import torch
import glob as gb
import numpy as np
from PIL import Image
batch_size = 10
mean = (131.0912, 103.8827, 91.4953)
def load_data(path='', shape=None):
short_size = 224.0
crop_size = shape
img = PIL.Image.open(path)
im_shape = np.array(img.size) # in the format of (width, height, *)
img = img.convert('RGB')
ratio = float(short_size) / np.min(im_shape)
img = img.resize(size=(int(np.ceil(im_shape[0] * ratio)), # width
int(np.ceil(im_shape[1] * ratio))), # height
resample=PIL.Image.BILINEAR)
x = np.array(img) # image has been transposed into (height, width)
newshape = x.shape[:2]
h_start = (newshape[0] - crop_size[0])//2
w_start = (newshape[1] - crop_size[1])//2
x = x[h_start:h_start+crop_size[0], w_start:w_start+crop_size[1]]
x = x - mean
return x
def chunks(l, n):
# For item i in a range that is a length of l,
for i in range(0, len(l), n):
# Create an index range for l of n items:
yield l[i:i+n]
def initialize_model():
# Set basic environments.
# Initialize GPUs
import resnet50_128 as model
network = model.resnet50_128(weights_path='../model/resnet50_128.pth')
network.eval()
return network
def image_encoding(model, facepaths):
print('==> compute image-level feature encoding.')
num_faces = len(facepaths)
face_feats = np.empty((num_faces, 128))
imgpaths = facepaths
imgchunks = list(chunks(imgpaths, batch_size))
for c, imgs in enumerate(imgchunks):
im_array = np.array([load_data(path=i, shape=(224, 224, 3)) for i in imgs])
f = model(torch.Tensor(im_array.transpose(0, 3, 1, 2)))[1].detach().cpu().numpy()[:, :, 0, 0]
start = c * batch_size
end = min((c + 1) * batch_size, num_faces)
# This is different from the Keras model where the normalization has been done inside the model.
face_feats[start:end] = f / np.sqrt(np.sum(f ** 2, -1, keepdims=True))
if c % 50 == 0:
print('-> finish encoding {}/{} images.'.format(c * batch_size, num_faces))
return face_feats
if __name__ == '__main__':
facepaths = gb.glob('../samples/*/*.jpg')
model_eval = initialize_model()
face_feats = image_encoding(model_eval, facepaths)
S = np.dot(face_feats, face_feats.T)
import pylab as plt
plt.imshow(S)
plt.show()
And you should expect to see the similarity matrix as :
problem solved, thank you very much
Cool.
I used model parameters in senet50_128_pytorch and used my own pictures as inputs. Then select output varible feat_extract in senet50_128.py as feature vectors. Finally normlize and compute cosine of each feature pairs. I have 4 different peoples, but for any pair of them, cosine values are all beyond 0.95. So how to choose feature vector? Dose parameters to generate bounding boxes affect feature vector significantly?