Fix prediction data not honoring cluster_selection_epsilon

This PR fixes an issue with prediction data not using cluster_selection_epsilon. This bug surfaces with wrong predictions from approximate_predict and incorrect exemplars_.

Code to reproduce the problem:

import hdbscan
from sklearn.datasets import make_blobs

blobs, _ = make_blobs(100, n_features=8, centers=10, random_state=42)

# use a high epsilon to force fewer clusters. real world data this happens more easily
clusterer = hdbscan.HDBSCAN(cluster_selection_epsilon=12.0, prediction_data=True)
clusterer.fit(blobs)

# 7 clusters from labels
clusterer.labels_.max() + 1
# 10 clusters from exemplars
len(clusterer.exemplars_)
# [5, 4, 3, 0, 5, 5, 6, 0, 5, 1]
clusterer.labels_[:10]
# predicting assigns points to completely different clusters (and number of clusters!)
# [6, 5, 4, 0, 6, 6, 9, 0, 6, 2]
hdbscan.approximate_predict(clusterer, blobs[:10])

I tracked the issue down to prediction data selecting the clusters from the tree differently to how it's done in _hdbscan_tree.pyx. The fix is to return the selected clusters from get_clusters in _hdbscan_tree.pyx and use the same clusters for prediction.

With this PR:

# 7 clusters from labels
clusterer.labels_.max() + 1
# 7 clusters from exemplars
len(clusterer.exemplars_)
# [5, 4, 3, 0, 5, 5, 6, 0, 5, 1]
clusterer.labels_[:10]
# predicting assigns points to correct clusters
# [5, 4, 3, 0, 5, 5, 6, 0, 5, 1]
hdbscan.approximate_predict(clusterer, blobs[:10])

This likely fixes #308

scikit-learn-contrib / hdbscan

Fix prediction data not honoring cluster_selection_epsilon #586