oracle / tribuo

Tribuo - A Java machine learning library
https://tribuo.org
Apache License 2.0
1.26k stars 173 forks source link

is HDBSCAN model export possible? #313

Open kalpit-konverge opened 1 year ago

kalpit-konverge commented 1 year ago

Ask the question Is it possible to export HDBSCAN model via ONNX? No such functionality exists in Python as far as I am aware.

Is your question about a specific ML algorithm or approach? This is about clustering, specifically the HDBSCAN method

Is your question about a specific Tribuo class? HdbscanModel.java

System details

Additional context It would be great if I can get some general pointers on how export/import of HDBSCAN could be achieved.

Craigacp commented 1 year ago

The HDBSCAN algorithm doesn't naturally have a predict method for determining the cluster assignment of a new point, as strictly speaking a new point could change the clustering of all the rest of the data. In Tribuo (and also in this Python scikit-learn-contrib HDBSCAN implementation) the prediction method is approximate and based on the nearest neighbour keypoints. It would be possible to export that nearest neighbour search into ONNX, but we've not done that for any of Tribuo's nearest neighbour prediction models (K-NN, K-Means, HDBSCAN) yet, and also as ONNX doesn't naturally have a nearest neighbour op it would bake in an exhaustive search of the keypoints (e.g. in the way this scikit-learn K-NN ONNX converter does - https://github.com/onnx/sklearn-onnx/blob/main/skl2onnx/operator_converters/nearest_neighbours.py#L64). We'd accept contributions to add that kind of export support, otherwise it's on the backlog of ONNX model export features and we'll get to it at some point in the future.

Tribuo is unlikely to ever support importing a HDBSCAN model into the HdbscanModel class, though with a small amount of additional work we could support loading in an ONNX clustering model (currently Tribuo is missing the ClusterID version of this output adaptor class which would be straightforward to add).