Open ctb opened 1 month ago
hdbscan looks pretty good ;)
a different kind of clustering, but support for using sourmash scripts cluster
output to generate a categories CSV was added in #35.
hdbscan implemented for hashes in https://github.com/ctb/2024-pangenome-hash-corr/blob/main/cluster-hash-assoc.py
in particular maybe add tSNE... others?
https://hdbscan.readthedocs.io/en/latest/comparing_clustering_algorithms.html
https://naomy-gomes.medium.com/k-means-clustering-explained-with-python-c7c69177b932
https://jakevdp.github.io/PythonDataScienceHandbook/05.11-k-means.html
https://scikit-learn.org/stable/auto_examples/manifold/plot_manifold_sphere.html#sphx-glr-auto-examples-manifold-plot-manifold-sphere-py
https://scikit-learn.org/stable/auto_examples/manifold/plot_compare_methods.html#sphx-glr-auto-examples-manifold-plot-compare-methods-py