This pull request adds the branch detection functionality from our preprint (https://arxiv.org/abs/2311.15887). Our main goal was to detect branch hierarchies within already detected clusters. These hierarchies describe cluster shapes, which can reveal subgroups not expressed in the density profile.
I have tried to add the functionality in way that minimises its impact on the codebase. I settled on the pattern used by the prediction module. The main usage pattern now looks like this:
The BranchDetector class mimics the HDBSCAN class and provides access to labels, membership probabilities, the detected hierarchies, and more. This way, end-users that just want clusters do not have to interact with the branch detection functionality at all.
I needed to make a couple of unrelated changes in Cython code to make all tests pass on my machine. I will try to mark these changes with review comments in the PR. Please advice on whether I should remove these changes from the PR or keep them in.
I hope you will consider merging this PR. Let me know if things need to be fixed / changed to better match your vision for the project.
Dear maintainers,
This pull request adds the branch detection functionality from our preprint (https://arxiv.org/abs/2311.15887). Our main goal was to detect branch hierarchies within already detected clusters. These hierarchies describe cluster shapes, which can reveal subgroups not expressed in the density profile.
I have tried to add the functionality in way that minimises its impact on the codebase. I settled on the pattern used by the prediction module. The main usage pattern now looks like this:
The
BranchDetector
class mimics theHDBSCAN
class and provides access to labels, membership probabilities, the detected hierarchies, and more. This way, end-users that just want clusters do not have to interact with the branch detection functionality at all.I needed to make a couple of unrelated changes in Cython code to make all tests pass on my machine. I will try to mark these changes with review comments in the PR. Please advice on whether I should remove these changes from the PR or keep them in.
I hope you will consider merging this PR. Let me know if things need to be fixed / changed to better match your vision for the project.
Kind regards,
Jelmer Bot