lucjb / pydata

11 stars 3 forks source link

AffinityPropagation clustering for multicolinearity #2

Open Sandy4321 opened 5 years ago

Sandy4321 commented 5 years ago

Lucas, Thank you very much for great talk! https://www.youtube.com/watch?v=ZD8LA3n6YvI&feature=youtu.be&t=443 All your talk are really useful. 1. Regarding to multicolinearity detection/reduction , As you mentioned at 7:27 https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html AffinityPropagation is good to use to find clusters of correlated data Yes it is great idea thank you, only one question May be it is needed before AffinityPropagation to use hierarchy clustering to gather similar features close each to other like suggested by Andreas Mueller in https://www.youtube.com/watch?v=EQQ5YQibXOI&feature=youtu.be&t=3422 Applied Machine Learning 2019 - Lecture 12 - Model Interpretration and Feature Selection

at 57:03
and slide 29 https://amueller.github.io/COMS4995-s19/slides/aml-12-interpretation-feature-selection/#p29 from scipy.cluster import hierarchy order = np.array(hierarchy.dendrogram( hierarchy.ward(cov),no_plot=True)['ivl'], dtype="int")

2. You also mentined that you do not use AffinityPropagation for this any more

may you share what you use and hopefully some code example

Thank you very much in advance....

Sandy4321 commented 5 years ago

may you meant something like this https://cran.r-project.org/web/packages/apcluster/apcluster.pdf The central function is apcluster. It runs affinity propagation on a given similarity matrix or it creates a similarity matrix for a given data set and similarity measure and runs affinity propagation on this matrix

Sandy4321 commented 5 years ago

or this https://stats.stackexchange.com/questions/275720/does-any-other-clustering-algorithms-take-correlation-as-distance-metric-apart Yes, first you use dist=sklearn.metrics.pairwise.pairwise_distances(data) to calculate the distance matrix from your data, and then you use the resulting dist object as input to the clustering algorithms, remembering to select the option affinity="precomputed for affinity propagation or metric="precomputed" in the case of DBSCAN. BTW for affinity propagation I think you need to transform the distances into similarities.