synthesized-io / insight

🧿 Metrics & Monitoring of Datasets
BSD 3-Clause "New" or "Revised" License
12 stars 0 forks source link

Additional metrics (+tests) #61

Closed simonhkswan closed 2 years ago

simonhkswan commented 2 years ago

For the tests of the new metrics, would be good to follow (you can search this approach too):

Let's have a look at pytest.fixtures too.

Hebruwu commented 2 years ago

Metrics: Bhattacharyya distance, Total Variation distance, Ideal number of clusters according to the average silhouette method

Hebruwu commented 2 years ago

As there are many clustering algorithms, I will be limiting the scope to k-means for now as it is a very popular algorithm, but requires the knowledge of an optimal number of clusters in order to perform optimally