Algorithm parameter "best" option?

Hi,

According to the API documentation, the algorithm parameter is set as follows:

algorithm : string, optional (default=’best’) Exactly which algorithm to use; hdbscan has variants specialised for different characteristics of the data. By default this is set to best which chooses the “best” algorithm given the nature of the data. You can force other options if you believe you know better. Options are:

best

generic

prims_kdtree

prims_balltree

boruvka_kdtree

boruvka_balltree

Is there any available comparison of the options with different real datasets?

Is there an explanation for?

By default this is set to best which chooses the “best” algorithm given the nature of the data.

Thank you for the contribution

Actually no, I don't have a good comprehensive comparison, and right now the 'best' option is a heuristic based on some (not ready for publication) grid-search style comparisons between the different approaches. The 'best' option should always exist (and does for similar sklearn classes), but exactly how to do it is another thing. I would be exceedingly happy if you wanted to do such a comprehensive comparison, and I would be more than happy to add it to the official documentation, as well as using it to better define the selection process when 'best' is selected.

On Sun, Jan 22, 2017 at 11:31 PM, Claudio Sanhueza <notifications@github.com

wrote:

Hi,

According to the API documentation, the algorithm parameter is set as follows:

algorithm : string, optional (default=’best’) Exactly which algorithm to use; hdbscan has variants specialised for different characteristics of the data. By default this is set to best which chooses the “best” algorithm given the nature of the data. You can force other options if you believe you know better. Options are:

best

generic

prims_kdtree

prims_balltree

boruvka_kdtree

boruvka_balltree

Is there any available comparison of the options with different real datasets?

Is there an explanation for?

By default this is set to best which chooses the “best” algorithm given the nature of the data.

Thank you for the contribution

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn-contrib/hdbscan/issues/83, or mute the thread https://github.com/notifications/unsubscribe-auth/ALaKBeM3OxTCElWwwhgoLuiaXMLNRuEnks5rVC0dgaJpZM4Lqp15 .

scikit-learn-contrib / hdbscan

Algorithm parameter "best" option? #83