sourmash-bio / sourmash_plugin_betterplot

Improved plotting/viz and cluster examination for sourmash
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

explore other clustering approaches and strategies #22

Open ctb opened 1 month ago

ctb commented 1 month ago

in particular maybe add tSNE... others?

https://hdbscan.readthedocs.io/en/latest/comparing_clustering_algorithms.html

https://naomy-gomes.medium.com/k-means-clustering-explained-with-python-c7c69177b932

https://jakevdp.github.io/PythonDataScienceHandbook/05.11-k-means.html

https://scikit-learn.org/stable/auto_examples/manifold/plot_manifold_sphere.html#sphx-glr-auto-examples-manifold-plot-manifold-sphere-py

https://scikit-learn.org/stable/auto_examples/manifold/plot_compare_methods.html#sphx-glr-auto-examples-manifold-plot-compare-methods-py

ctb commented 1 month ago

hdbscan looks pretty good ;)

ctb commented 3 weeks ago

a different kind of clustering, but support for using sourmash scripts cluster output to generate a categories CSV was added in #35.

ctb commented 3 weeks ago

hdbscan implemented for hashes in https://github.com/ctb/2024-pangenome-hash-corr/blob/main/cluster-hash-assoc.py