mhahsler / dbscan

Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms - R package
GNU General Public License v3.0
310 stars 64 forks source link

Implement Density-Based Clustering Validation (DBCV) #39

Open mhahsler opened 4 years ago

mhahsler commented 4 years ago

http://epubs.siam.org/doi/pdf/10.1137/1.9781611973440.96

michaelgaunt404 commented 2 years ago

What is the status of this feature?

I have ran into a problem that would benefit from this enhancement was wondering if it was in development?

mhahsler commented 2 years ago

I think Matt is busy. I have added the label "help wanted".

m-muecke commented 3 months ago

The reference implementation from the authors written in Matlab: https://github.com/pajaskowiak/dbcv

@mhahsler Is the goal to write the implementation in C++ or R?

mhahsler commented 3 months ago

@m-muecke Probably a mixture. There may be some parts of the algorithm that require fast loops that cannot be implemented using matrix operations, and that should be done in C++. You can start with a pure R version and then replace the time-critical parts with C++ if needed.

mhahsler commented 3 months ago

@peekxc Hi Matt. It looks like there is a functioning version of DBCV in the 6-year old branch dbcv. Is it worth looking at it and trying to merge it into the current master, or should we start over?

peekxc commented 2 months ago

Hey there. So I don't remember much about this code, but from what I can loosely recall I finished a basic implementation of the measure, but I think where I left off was in making test cases to validate the measure was correct and improving the robustness of the code to make it package worthy.

Right now the branch looks like it includes a C++ implementation of the fundamental primitive (the "all-points-core-distance") needed for computing the validation measure (here's the diff), along with some R code that puts it all together.

If we could get generate some test data sets and some unit tests I think it could be worth pulling in. Whether the code on the branch is used or the code is written from scratch, I think we mainly would want to be sure its implemented correctly (as much as we can)

mhahsler commented 2 months ago

@m-muecke Do you want to look into code Matt's code to add and test DBCV?

m-muecke commented 2 months ago

@m-muecke Do you want to look into code Matt's code to add and test DBCV?

sure, should find time in the upcoming weeks