smadha / MlTrio

CSCI-567 course project
Apache License 2.0
0 stars 0 forks source link

cluster users based on similarity #1

Open smadha opened 7 years ago

smadha commented 7 years ago

Two possible ways -

  1. Use KMeans with multiple K - http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans
  2. Using a distance matrix .

We can calculate similarity S_i_j between user i and j using Pearson correlation, Cosine-Based Similarity, measuring KL divergence etc..

smadha commented 7 years ago

We will create 2 type of clusters

  1. based on Word ID sequence similarity between users
  2. based on Character ID sequence similarity between users