scikit-tda / kepler-mapper

Kepler Mapper: A flexible Python implementation of the Mapper algorithm.
https://kepler-mapper.scikit-tda.org
MIT License
623 stars 180 forks source link

Bug: min_cluster_samples should not be set to a non-integer #223

Closed michaelkerber closed 3 years ago

michaelkerber commented 3 years ago

Is your feature request related to a problem? Please describe. KeplerMapper does not work in combination with sklearn.clustering.AgglomerativeClustering (using distance_threshold). It seems that KeplerMapper relies on the parameter "min_clusters/n_clusters/min_samples" to be available (kmapper.py, line 506).

Describe the solution you'd like The documentation should be more clear about what is expected from the clustering method to work in KeplerMapper.

deargle commented 3 years ago

That code block has a final fallback of 2 if none of the previous are available: https://github.com/scikit-tda/kepler-mapper/blob/56dbd6642b69830f329d272bf9f6c5e877386cb8/kmapper/kmapper.py#L506-L511

deargle commented 3 years ago

Also, here's an example of someone using AgglomerativeClustering https://github.com/scikit-tda/kepler-mapper/issues/185#issuecomment-571255773 , what error message are you getting? Full stack trace please?

michaelkerber commented 3 years ago

I tried this clustering method: cluster_method=sklearn.cluster.AgglomerativeClustering(n_clusters=None,distance_threshold=0.2,compute_full_tree=True,linkage='single')

Traceback (most recent call last): File "mapper_cat.py", line 32, in cover=my_cover, File "/home/mkerber/.local/lib/python3.7/site-packages/kmapper/kmapper.py", line 529, in map if hypercube.shape[0] >= min_cluster_samples: TypeError: '>=' not supported between instances of 'int' and 'NoneType'

I think the error makes sense, since n_clusters is set, just not to an integer. The documentation of AgglomerativeClustering says that n_clusters has to be set to None if distance_threshold is set.

Perhaps you can check whether the parameter is an integer and if not, set it to 2 instead?

deargle commented 3 years ago

Indeed! Thanks for the bug report.I remember now that @torlarse had hard-coded kmapper to use min_cluster_samples = 2 when they posted the code in that comment. Maybe you can try that until a patch is released?