Closed dacong001 closed 6 years ago
是的, 如果不拿出来,会导致索引大量图片时内存溢出。
另外,拿出来以后一定要在model后面立即执行一次predict,参考 这里
我修改后的代码为:
# -*- coding: utf-8 -*-
# Author: yongyuan.name
# [1]https://zhuanlan.zhihu.com/p/27101000
import numpy as np
from numpy import linalg as LA
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
input_shape = (224, 224, 3)
model = VGG16(weights='imagenet', input_shape=(input_shape[0], input_shape[1], input_shape[2]), pooling='max',
include_top=False)
model.predict(np.zeros((1, 224, 224 , 3))) # 机器玄学[1]
'''
Use vgg16 model to extract features
Output normalized feature vector
'''
def extract_feat(img_path):
# weights: 'imagenet'
# pooling: 'max' or 'avg'
# input_shape: (width, height, 3), width and height should >= 48
img = image.load_img(img_path, target_size=(input_shape[0], input_shape[1]))
img = image.img_to_array(img)
img = np.expand_dims(img, axis=0)
img = preprocess_input(img)
feat = model.predict(img)
norm_feat = feat[0]/LA.norm(feat[0])
return norm_feat
图像特征为啥除以范数呢。feat[0]/LA.norm(feat[0])
@nikenj 采用的是余弦相似度,所以会用L2范数对特征进行归一化。
那为啥是用L2范数做归一化~用别的不可以嘛,比如min-max~
计算余弦相似度,必须是L2归一化。如果你想用别的距离度量,可以做相对应的归一化方式。
每次提特征过程都需要初始化模型问题已经解决
重复定义了,所以建立索引的时候处理图片的时间较长