zengbin93 / blog

17 stars 10 forks source link

笔记 - 相似度计算 #12

Open zengbin93 opened 6 years ago

zengbin93 commented 6 years ago

相似度计算方法汇总

常用的相似性计算方法有:1)欧式距离(Euclidean Distance); 2)曼哈顿距离(Manhattan Distance); 3)闵式距离(Minkowski Distance); 4)余弦距离; 5)动态时间规整(DTW);

import numpy as np

vector1 = [8.92, 8.71, 8.77, 8.75, 8.74, 8.71, 8.66, 8.71, 8.74, 8.8, 8.8, 
           8.79, 8.64, 8.56, 8.56, 8.43, 8.38, 8.42]
vector2 = [32.99, 32.27, 32.16, 31.93, 32.8, 33.16, 32.59, 32.61, 29.35, 
           28.41, 27.85, 28.62, 28.62, 29.35, 30.14, 29.34, 28.88, 29.05]

# vector1 = [1, 1, 1, 1]
# vector2 = [1, 1, 1, 1]
def euclidean_distant(vector1, vector2):
    """欧式距离"""
    vector1 = np.mat(vector1)
    vector2 = np.mat(vector2)
    return np.sqrt((vector1-vector2)*((vector1-vector2).T)).item()

print(euclidean_distant(vector1, vector2))
93.19353786609885
def manhattan_distant(vector1, vector2):
    """曼哈顿距离"""
    vector1 = np.mat(vector1)
    vector2 = np.mat(vector2)
    return np.sum(np.abs(vector1-vector2))

print(manhattan_distant(vector1, vector2))
394.03
def cosine_distant(vector1, vector2):
    """余弦距离"""
    vector1 = np.mat(vector1)
    vector2 = np.mat(vector2)
    vector1_norm = np.linalg.norm(vector1)
    vector2_norm = np.linalg.norm(vector2)
    dot_norm = vector1_norm * vector2_norm
    dot_vs = np.dot(vector1, vector2)
    return np.divide(dot_vs, dot_norm).item()

print(cosine_distant(vector1, vector2))    
0.9983665339530308

from sklearn.metrics.pairwise import cosine_similarity