Closed brainbytes42 closed 2 years ago
The NeighboursQueryFactory
is the interface you're after, using this constructor (https://github.com/oracle/tribuo/blob/main/Clustering/Hdbscan/src/main/java/org/tribuo/clustering/hdbscan/HdbscanTrainer.java#L212), but it looks like we still use the distance function independently of the query factory in HDBSCAN so supplying your own implementation won't work today. We can't change the distance enum without breaking backwards compatibility, but we can look at switching it over to be an interface in the next major release. The enum for NeighboursQueryFactoryType
should be a convenience and not be required in at least one codepath, so that shouldn't be a problem.
Wasn't too hard to migrate over to a Distance
interface without breaking compatibility, so we'll get this in before 4.3.
Hi there,
I was really happy to find a Java-HDBSCAN-Implementation, but at the same time, I'm a little sad, why there are Enums used for the initialization? DistanceType and NeighboursQueryFactoryType are both enums and therefore not extendable with custom Metrics. https://github.com/oracle/tribuo/blob/04c6fc35f95ec3843c0e2df3da88b91554c1abc1/Clustering/Hdbscan/src/main/java/org/tribuo/clustering/hdbscan/HdbscanTrainer.java#L197 For my usecase, I have a custom distance-Matric and also a matching custom Datastructure to provide nearest neighbours (it uses the distance between shapes), but I'd need to change the enum and can't just implement a simple interface and inject those custom parts. Therefore, I think it would be good to migrate those enums to implement an interface that is used for the algorithm; then I could implement the interface and also use my code in the algorithm.