Closed tyrcho closed 9 years ago
Hi tyrcho, so far I've not thought about implementing the average linking strategy using a weighted pair group method (because i don't need it). But if you manage to provide the implementation of such linkage strategy (and unit tests) I would certainly accept a pull request. The main problem with the current software design is probably the LinkageStrategy interface which would need to change. Hence this extension will be quite invasive. But this is rather a weakness of the current design. So go for it!
Thanks for the quick reply ! I gave a shot at the implementation but I realized it's going to be more complex than I had in mind.
Adding weights to elements adds more than just weights to distance (which doesn't mean much). Actually it means I need to keep track of the coordinates of all my elements instead of just the distance matrix, and build the barycentre of the elements when I put them in a cluster. It would change a LOT of the current interfaces ...
I'll keep you posted if I have something concrete.
Edit : after second thought, maybe it's not that bad ... I wonder if the distance of the new cluster to other elements can be considered as the weighted average of distances from the 2 agglomerated elements. I still need to check it works regardless of dimensions (in my context I'll have up to several hundred dimensions), my math skills are a bit rusty ...
Seems to work nicely. I merged dev branch into master.
Created new release 1.1.0 and published it on maven central.
Hi,
I'd like to use a new linking strategy which takes into account another dimension : the weight of elements I'm trying to sort into clusters.
So I'd need to add a new array together with the elements with their weights and a new Strategy which would make an weighted average of the distances. This also means that the weights need to be carried together with the clusters and passed to the strategy.
What's you advice ? Is it something compatible with what you already implemented ? Would you accept a PR with an implementation of that ?
My use case : I'm trying to sort out decks in card games (magic, hearthstone) into archetypes, and I'd like to take into account the popularity of the decks.