oracle / tribuo

Tribuo - A Java machine learning library
https://tribuo.org
Apache License 2.0
1.27k stars 176 forks source link

Limitid extendability by using Enums #284

Closed brainbytes42 closed 2 years ago

brainbytes42 commented 2 years ago

Hi there,

I was really happy to find a Java-HDBSCAN-Implementation, but at the same time, I'm a little sad, why there are Enums used for the initialization? DistanceType and NeighboursQueryFactoryType are both enums and therefore not extendable with custom Metrics. https://github.com/oracle/tribuo/blob/04c6fc35f95ec3843c0e2df3da88b91554c1abc1/Clustering/Hdbscan/src/main/java/org/tribuo/clustering/hdbscan/HdbscanTrainer.java#L197 For my usecase, I have a custom distance-Matric and also a matching custom Datastructure to provide nearest neighbours (it uses the distance between shapes), but I'd need to change the enum and can't just implement a simple interface and inject those custom parts. Therefore, I think it would be good to migrate those enums to implement an interface that is used for the algorithm; then I could implement the interface and also use my code in the algorithm.

Craigacp commented 2 years ago

The NeighboursQueryFactory is the interface you're after, using this constructor (https://github.com/oracle/tribuo/blob/main/Clustering/Hdbscan/src/main/java/org/tribuo/clustering/hdbscan/HdbscanTrainer.java#L212), but it looks like we still use the distance function independently of the query factory in HDBSCAN so supplying your own implementation won't work today. We can't change the distance enum without breaking backwards compatibility, but we can look at switching it over to be an interface in the next major release. The enum for NeighboursQueryFactoryType should be a convenience and not be required in at least one codepath, so that shouldn't be a problem.

Craigacp commented 2 years ago

Wasn't too hard to migrate over to a Distance interface without breaking compatibility, so we'll get this in before 4.3.