Open chryselectrum opened 6 years ago
I'm afraid that singe linkage tree plotting simply won't work with that much data. You'll have to use the condensed tree plots instead. Sorry.
Thank you for your quick response. Using the condensedtree.plot() I also ran into problems. This time I get stack overflow:
Fatal Python error: Cannot recover from stack overflow.
Current thread 0x00007fada06ec740 (most recent call first):
File "/usr/lib64/python3.6/site-packages/hdbscan/plots.py", line 36 in _recurse_leaf_dfs
File "/usr/lib64/python3.6/site-packages/hdbscan/plots.py", line 40 in <listcomp>
File "/usr/lib64/python3.6/site-packages/hdbscan/plots.py", line 40 in _recurse_leaf_dfs
File "/usr/lib64/python3.6/site-packages/hdbscan/plots.py", line 40 in <listcomp>
File "/usr/lib64/python3.6/site-packages/hdbscan/plots.py", line 40 in _recurse_leaf_dfs
File "/usr/lib64/python3.6/site-packages/hdbscan/plots.py", line 40 in <listcomp>
File "/usr/lib64/python3.6/site-packages/hdbscan/plots.py", line 40 in _recurse_leaf_dfs
File "/usr/lib64/python3.6/site-packages/hdbscan/plots.py", line 40 in <listcomp>
File "/usr/lib64/python3.6/site-packages/hdbscan/plots.py", line 40 in _recurse_leaf_dfs
File "/usr/lib64/python3.6/site-packages/hdbscan/plots.py", line 40 in <listcomp>
File "/usr/lib64/python3.6/site-packages/hdbscan/plots.py", line 40 in _recurse_leaf_dfs
File "/usr/lib64/python3.6/site-packages/hdbscan/plots.py", line 40 in <listcomp>
File "/usr/lib64/python3.6/site-packages/hdbscan/plots.py", line 40 in _recurse_leaf_dfs
File "/usr/lib64/python3.6/site-packages/hdbscan/plots.py", line 40 in <listcomp>
File "/usr/lib64/python3.6/site-packages/hdbscan/plots.py", line 40 in _recurse_leaf_dfs
File "/usr/lib64/python3.6/site-packages/hdbscan/plots.py", line 40 in <listcomp>
File "/usr/lib64/python3.6/site-packages/hdbscan/plots.py", line 40 in _recurse_leaf_dfs
...
Anything that can be done about it?
That would mean that you have too low a min_cluster size to get a sensible plot out. You will need to increase the min_cluster_size
parameter to something rather larger. In doing so you may want to set the min_samples
parameter explicitly (otherwise it will be set to whatever value you provide min_cluster_size
). In this case min_samples
can probably be set to whatever value you were originally using for min_cluster_size
.
Thank you for your help, I managed to get better understanding of the clustering by increasing the min_cluster_size and keeping min_samples small.
Glad I could help you get something that worked out for now.
I'm trying to get more information on the clustering using the single_linkagetree figure. However, I'm getting the error stack trace below on using the single_linkagetree.plot() method. I'm trying to cluster around 200000 data points with 100 features.
Any help available on how to tackle the problem?