Closed miguelgfierro closed 6 years ago
structural similarity index for image similarity:
Ways to describe a dataset: Center. Graphically, the center of a distribution is the point where about half of the observations are on either side. Spread. The spread of a distribution refers to the variability of the data. If the observations cover a wide range, the spread is larger. If the observations are clustered around a single value, the spread is smaller. Shape. The shape of a distribution is described by symmetry , skewness , number of peaks, etc. Unusual features. Unusual features refer to gaps (areas of the distribution where there are no observations) and outliers .
Comparing Measures of Sparsity: https://arxiv.org/abs/0811.4706 Gini index: https://github.com/oliviaguest/gini measures inequality or sparsity
Clustering: http://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html https://openproceedings.org/2014/conf/edbt/FriesWS14.pdf performance of clustering algos: http://hdbscan.readthedocs.io/en/latest/performance_and_scalability.html DBSCAN & HDBSCAN seems interesting explanation of different clustering algos: http://hdbscan.readthedocs.io/en/latest/comparing_clustering_algorithms.html
2 dimensions as stated in this transfer learning tutorial: size and similarity to the original dataset. A) The size can be in number of examples and number of examples per class, here maybe we can do a weighted quadratic difference. B) similarity, there is literature on image similarity (maybe KL for colour and texture)