haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
6.04k stars 1.13k forks source link

HDBScan #423

Closed SomeUserName1 closed 5 years ago

SomeUserName1 commented 5 years ago

Feature Contribution: HDBScan

Hi,

I'd like to implement HDBScan as part of my bachelor's project in Java using & extending SMILE.

As mentioned in the Contribution.md I am opening an issue.

My approach would be to port the Python lib as close as possible to achieve comparable performance. This includes other algorithms e.g. Dual Tree Boruvka Minimimum Spanning Tree.
Some guidance where to put the classes (besides HDBScan, obviously core/clustering) and whether certain components are already there 1:1 (e.g. the metrics from here would be very helpful.

Best, Fabian

haifengl commented 5 years ago

It would be wonderful. Thanks!

Clearly, HDBScan would be in core's smile.clustering. There are many distance/metric classes in math module (smile.math.distance). Check out DBScan class. The user can pass a distance metric or a neareast neighbor search data structure (e.g. KDTree). Check out smile.neighbor package (in core module).

SomeUserName1 commented 5 years ago

I was told to popen the jar instead of implementing it and to focus on a survey of related work rather than implementing sth. that is state of the art :( sorry for the false alarm