josephius / star-clustering

A clustering algorithm that automatically determines the number of clusters and works without hyperparameter fine-tuning.
Apache License 2.0
213 stars 21 forks source link

upper threshold, (cosine) angular distance, limit scaling #4

Closed shy1 closed 4 years ago

shy1 commented 4 years ago

adding multiple arguments to the fit() method probably goes against the intended parameter-free spirit of the project, but everything should work the same as the original code if default values are used and some of the new word cluster results are fairly impressive

shy1 commented 4 years ago

This looks good. I really appreciate how quickly you managed to whip this up, and over a weekend too!

thanks, i recently quit my software engineering job to focus on my research and do part-time freelance work, but i haven't actually started seeking out freelance work yet and my current research requires very little activity outside of waiting for models to finish training on large datasets, so i've got a lot of excess productivity to spare.

plus i've had a few language modeling ideas involving word embedding clusters in the past, but a lack of clustering experience was a pretty big mental roadblock preventing me from attempting to implement them. your ready-to-go word vector test script could not have been more perfect for clearing that roadblock. and seeing how different distance formulations affected the types of cluster groupings has even inspired some new ideas, so this has been a very worthwhile use of my time.

shy1 commented 4 years ago

you see any reason not to delete the feature branch now that it's been merged? doesn't make much difference at this stage, but definitely a good practice to keep things from getting cluttered over time.

the hoarder in me hates it, but i've seen enough barely navigable nightmares working with large teams to know that it's the right thing to do.

josephius commented 4 years ago

Good to hear my test script was of use to help you with your mental block.

Okay, branch deleted!