model的定义最好从函数中拿出来

dacong001 commented 7 years ago

重复定义了，所以建立索引的时候处理图片的时间较长

duchengyao commented 7 years ago

是的，如果不拿出来，会导致索引大量图片时内存溢出。

另外，拿出来以后一定要在model后面立即执行一次predict，参考这里

我修改后的代码为：

# -*- coding: utf-8 -*-
# Author: yongyuan.name
# [1]https://zhuanlan.zhihu.com/p/27101000

import numpy as np
from numpy import linalg as LA

from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input

input_shape = (224, 224, 3)
model = VGG16(weights='imagenet', input_shape=(input_shape[0], input_shape[1], input_shape[2]), pooling='max',
              include_top=False)
model.predict(np.zeros((1, 224, 224 , 3)))  # 机器玄学[1]

'''
 Use vgg16 model to extract features
 Output normalized feature vector
'''
def extract_feat(img_path):
    # weights: 'imagenet'
    # pooling: 'max' or 'avg'
    # input_shape: (width, height, 3), width and height should >= 48

    img = image.load_img(img_path, target_size=(input_shape[0], input_shape[1]))
    img = image.img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = preprocess_input(img)
    feat = model.predict(img)
    norm_feat = feat[0]/LA.norm(feat[0])
    return norm_feat

nikenj commented 6 years ago

图像特征为啥除以范数呢。feat[0]/LA.norm(feat[0])

willard-yuan commented 6 years ago

@nikenj 采用的是余弦相似度，所以会用L2范数对特征进行归一化。

nikenj commented 6 years ago

那为啥是用L2范数做归一化~用别的不可以嘛，比如min-max~

willard-yuan commented 6 years ago

计算余弦相似度，必须是L2归一化。如果你想用别的距离度量，可以做相对应的归一化方式。

willard-yuan commented 6 years ago

每次提特征过程都需要初始化模型问题已经解决

willard-yuan / flask-keras-cnn-image-retrieval

model的定义最好从函数中拿出来 #4