scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.
http://hdbscan.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
2.78k stars 497 forks source link

Inconsistencies in Core Distance Computation #560

Open tarang-jain opened 2 years ago

tarang-jain commented 2 years ago

For computing the core distances, during training the (min_samples+1)-th neighbor is considered, but while building the PredictionData object, the (minsamples)-th neighbor is considered. The specific parts in the code that I am referring to are: https://github.com/scikit-learn-contrib/hdbscan/blob/379d523d4e6b059db30970c8f5a08f383d5f3a6f/hdbscan/hdbscan.py#L245

and

https://github.com/scikit-learn-contrib/hdbscan/blob/379d523d4e6b059db30970c8f5a08f383d5f3a6f/hdbscan/prediction.py#L103

traderjoesbrownielover commented 2 years ago

Hey, I would be willing to work on this. Can this please be assigned to me?