tkellogg / fossil

A mastodon client optimized for reading, with an AI-enabled algorithm for displaying posts
https://www.fossil-social.com
65 stars 8 forks source link

New Algorithm: DBSCAN clustering #16

Open tkellogg opened 9 months ago

tkellogg commented 9 months ago

I like the clustering approach, but I don't like that k-means makes you say up front how many clusters there's going to be (i'm discovering too, it's a new day, i don't know yet, right??). I want to experiment with other clustering algorithms that make different assumptions and trade-offs about the data.

DBSCAN seems interesting because it finds clusters based on density. So you have to say what the expected density should be, that threshold that defines a cluster.

I expect that there will be a lot of tweaking to make it work for a certain embedding model, but after you get it to work it'll be a lot more dynamic and robust.

Note: DBSCAN doesn't assign all posts to a cluster, so you might not be able to use the toot_clusters.html on it's own. You'll probably need an offshoot of it. Feel free to skip this part on the first pass of the PR, we might even be able to get someone else to do this part.