dist_metrics error with default settings

architec997 commented 6 years ago

Producing a simple dataframe via

x = np.linspace(0,100,200)
y = np.arange(0,200)
xy, _ = np.meshgrid(x,y)
noise = 0.3*np.random.random((200,200))
series = np.sin(xy+5*noise) + noise
series [0,:] += 10*np.random.random(200)
data = pd.DataFrame(series)

I try to run HDBSCAN clustering with the default arguments

clusterer = hdbscan.HDBSCAN().fit(data)

And get the following error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-58-31e012db38f8> in <module>()
      1 from sklearn.cluster import DBSCAN
----> 2 clusterer = hdbscan.HDBSCAN().fit(data)

~/anaconda3/envs/py36/lib/python3.6/site-packages/hdbscan/hdbscan_.py in fit(self, X, y)
    814          self._condensed_tree,
    815          self._single_linkage_tree,
--> 816          self._min_spanning_tree) = hdbscan(X, **kwargs)
    817 
    818         if self.prediction_data:

~/anaconda3/envs/py36/lib/python3.6/site-packages/hdbscan/hdbscan_.py in hdbscan(X, min_cluster_size, min_samples, alpha, metric, p, leaf_size, algorithm, memory, approx_min_span_tree, gen_min_span_tree, core_dist_n_jobs, cluster_selection_method, allow_single_cluster, match_reference_implementation, **kwargs)
    534                     _hdbscan_prims_kdtree)(X, min_samples, alpha,
    535                                            metric, p, leaf_size,
--> 536                                            gen_min_span_tree, **kwargs)
    537             else:
    538                 (single_linkage_tree, result_min_span_tree) = memory.cache(

~/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/externals/joblib/memory.py in __call__(self, *args, **kwargs)
    360 
    361     def __call__(self, *args, **kwargs):
--> 362         return self.func(*args, **kwargs)
    363 
    364     def call_and_shelve(self, *args, **kwargs):

~/anaconda3/envs/py36/lib/python3.6/site-packages/hdbscan/hdbscan_.py in _hdbscan_prims_kdtree(X, min_samples, alpha, metric, p, leaf_size, gen_min_span_tree, **kwargs)
    168 
    169     # TO DO: Deal with p for minkowski appropriately
--> 170     dist_metric = DistanceMetric.get_metric(metric, **kwargs)
    171 
    172     # Get distance to kth nearest neighbour

TypeError: descriptor 'get_metric' requires a 'hdbscan.dist_metrics.DistanceMetric' object but received a 'str'

I tried explicitly specifying other metrics with metric = 'manhattan' etc argument, did not help

lmcinnes commented 6 years ago

I suspect this is an arg order issue in the code somewhere, possibly due to additions. This is a little disconcerting. Let me see if I can track this down later today.

lmcinnes commented 6 years ago

Sorry, I ran out of time today. I'll have to try and get to this a little later. My apologies for the delay.

farfan92 commented 6 years ago

Also getting "TypeError: descriptor 'get_metric' requires a 'hdbscan.dist_metrics.DistanceMetric' object but received a 'str'', even when just using the simple case in the documentation.

farfan92 commented 6 years ago

Error occurs with RobustSingleLinkage as well.

When trying to avoid the get_metric method receiving the string 'euclidean' or 'manhattan' etc. instead of the expected object, I used a precomputed distance matrix. Now getting:

`clusterer = hdbscan.HDBSCAN(min_cluster_size=5, min_samples=None, metric='precomputed').fit(gower_df)

NameError Traceback (most recent call last)

in () ----> 1 clusterer = hdbscan.HDBSCAN(min_cluster_size=5, min_samples=None, metric='precomputed').fit(D) C:\Users\centec7\AppData\Local\Continuum\Anaconda3\lib\site-packages\hdbscan\hdbscan_.py in fit(self, X, y) 814 self._condensed_tree, 815 self._single_linkage_tree, --> 816 self._min_spanning_tree) = hdbscan(X, **kwargs) 817 818 if self.prediction_data: C:\Users\centec7\AppData\Local\Continuum\Anaconda3\lib\site-packages\hdbscan\hdbscan_.py in hdbscan(X, min_cluster_size, min_samples, alpha, metric, p, leaf_size, algorithm, memory, approx_min_span_tree, gen_min_span_tree, core_dist_n_jobs, cluster_selection_method, allow_single_cluster, match_reference_implementation, **kwargs) 526 _hdbscan_generic)(X, min_samples, 527 alpha, metric, p, leaf_size, --> 528 gen_min_span_tree, **kwargs) 529 elif metric in KDTree.valid_metrics: 530 # TO DO: Need heuristic to decide when to go to boruvka; C:\Users\centec7\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\memory.py in __call__(self, *args, **kwargs) 281 return _load_output(self._output_dir, _get_func_fullname(self.func), 282 timestamp=self.timestamp, --> 283 metadata=self.metadata, mmap_mode=self.mmap_mode, 284 verbose=self.verbose) 285 C:\Users\centec7\AppData\Local\Continuum\Anaconda3\lib\site-packages\hdbscan\hdbscan_.py in _hdbscan_generic(X, min_samples, alpha, metric, p, leaf_size, gen_min_span_tree, **kwargs) 85 min_samples, alpha) 86 ---> 87 min_spanning_tree = mst_linkage_core(mutual_reachability_) 88 89 # mst_linkage_core does not generate a full minimal spanning tree hdbscan/_hdbscan_linkage.pyx in hdbscan._hdbscan_linkage.mst_linkage_core (hdbscan\_hdbscan_linkage.c:2894)() hdbscan/_hdbscan_linkage.pyx in hdbscan._hdbscan_linkage.mst_linkage_core (hdbscan\_hdbscan_linkage.c:2281)() NameError: name 'np' is not defined `

lmcinnes commented 6 years ago

Sorry, I'm having trouble reproducing this. Can you tell me a little more about your setup?

architec997 commented 6 years ago

I also checked - I have the same error using a precomputed distance matrix as farfan92.

Ubuntu 17.10, Anaconda 5.0.1, Python 3.6. The versions of packages installed in my used venv are: packages in environment at /home/vladimir/anaconda3/envs/py36: #

asn1crypto 0.22.0 py36h265ca7c_1
bleach 2.0.0 py36h688b259_0
ca-certificates 2017.08.26 h1d4fec5_0
certifi 2017.7.27.1 py36h8b7b77e_0
cffi 1.10.0 py36had8d393_1
chardet 3.0.4 py36h0f667ec_1
chardet 3.0.4
click 6.7
clickclick 1.2.2
connexion 1.1.16
cryptography 2.0.3 py36ha225213_1
cycler 0.10.0 py36h93f1223_0
cython 0.26.1 py36h21c49d0_0
dbus 1.10.22 h3b5a359_0
decorator 4.1.2 py36hd076ac8_0
entrypoints 0.2.3 py36h1aec115_2
expat 2.2.4 h6ea4f2b_2
fastdtw 0.3.2
Flask 0.12.2
fontconfig 2.12.4 h88586e7_1
freetype 2.8 h52ed37b_0
glib 2.53.6 h5d9569c_2
gmp 6.1.2 hb3b607b_0
gst-plugins-base 1.12.2 he3457e5_0
gstreamer 1.12.2 h4f93127_0
h5py 2.7.0 py36he81ebca_1
hdbscan 0.8.11 py36_0 conda-forge
hdf5 1.10.1 hb0523eb_0
html5lib 0.999999999 py36h2cfc398_0
icu 58.2 h211956c_0
idna 2.6 py36h82fb2a8_1
idna 2.6
inflection 0.3.1
intel-openmp 2018.0.0 h15fc484_7
ipykernel 4.6.1 py36hbf841aa_0
ipython 6.1.0 py36hc72a948_1
ipython_genutils 0.2.0 py36hb52b0d5_0
ipywidgets 7.0.0 py36h7b55c3a_0
itsdangerous 0.24
jedi 0.10.2 py36h552def0_0
jinja2 2.9.6 py36h489bce4_1
jpeg 9b h024ee3a_2
jsonschema 2.6.0 py36h006f8b5_0
jupyter 1.0.0 py36h9896ce5_0
jupyter_client 5.1.0 py36h614e9ea_0
jupyter_console 5.2.0 py36he59e554_1
jupyter_core 4.3.0 py36h357a921_0
keras 2.0.8 py36hc0b6f7c_0
libedit 3.1 heed3624_0
libffi 3.2.1 hd88cf55_4
libgcc 7.2.0 h69d50b8_2
libgcc-ng 7.2.0 h7cc24e2_2
libgfortran 3.0.0 1
libgfortran-ng 7.2.0 h9f7466a_2
libgpuarray 0.6.9 0
libiconv 1.15 h63c8f33_5
libpng 1.6.32 hda9c8bc_2
libprotobuf 3.4.0 0
libsodium 1.0.13 h31c71d8_2
libstdcxx-ng 7.2.0 h7a57d05_2
libxcb 1.12 h84ff03f_3
libxml2 2.9.4 h6b072ca_5
mako 1.0.7 py36h0727276_0
markupsafe 1.0 py36hd9260cd_1
matplotlib 2.1.0 py36hba5de38_0
mistune 0.8.1 py36h3d5977c_0
mkl 2018.0.0 hb491cac_4
mkl-service 1.1.2 py36h17a0993_4
nbconvert 5.3.1 py36hb41ffb7_0
nbformat 4.4.0 py36h31c9010_0
ncurses 6.0 h9df7e31_2
networkx 2.0 py36h7e96fb8_0
nose 1.3.7 py36hcdf7029_2
notebook 5.2.1 py36h690a4eb_0
numpy 1.13.3
numpy 1.12.1 py36he24570b_1
openssl 1.0.2m h8cfc7e7_0
pandas 0.21.0 py36h78bd809_1
pandoc 1.19.2.1 hea2e7c5_1
pandocfilters 1.4.2 py36ha6701b7_1
path.py 10.3.1 py36he0c6f6d_0
pathlib 1.0.1
patsy 0.4.1 py36ha3be15e_0
pcre 8.41 hc71a17e_0
pexpect 4.2.1 py36h3b9d41b_0
pickleshare 0.7.4 py36h63277f8_0
pip 9.0.1 py36h6c6f9ce_4
plotly 2.1.0 py36h56a57e5_0
prompt_toolkit 1.0.15 py36h17d85b1_0
protobuf 3.4.0 py36_0
ptyprocess 0.5.2 py36h69acd42_0
pycparser 2.18 py36hf9f622e_1
pygments 2.2.0 py36h0d3125c_0
pygpu 0.6.9 py36_0
pyopenssl 17.2.0 py36h5cc804b_0
pyparsing 2.2.0 py36hee85983_1
pyqt 5.6.0 py36h0386399_5
pysocks 1.6.7 py36hd97a5b1_1
python 3.6.3 h1284df2_4
python-dateutil 2.6.1 py36h88d3b88_1
pytz 2017.2 py36hc2ccc2a_1
pyyaml 3.12 py36hafb9ca4_1
pyzmq 16.0.2 py36h3b0cf96_2
qt 5.6.2 h974d657_12
qtconsole 4.3.1 py36h8f73b5b_0
readline 7.0 ha6073c6_4
requests 2.18.4 py36he2e5f8d_1
requests 2.18.4
scikit-learn 0.19.1 py36h7aa7ec6_0
scipy 1.0.0 py36hbf646e7_0
scipy 1.0.0
seaborn 0.8.0 py36h197244f_0
seasonal 0.3.1
setuptools 36.5.0 py36he42e2e1_0
simplegeneric 0.8.1 py36h2cb9092_0
sip 4.18.1 py36h51ed4ed_2
six 1.11.0 py36h372c433_1
sqlite 3.20.1 hb898158_2
statsmodels 0.8.0 py36h8533d0b_0
swagger-spec-validator 2.1.0
tensorflow 1.1.0 np112py36_0
terminado 0.6 py36ha25a19f_0
testpath 0.3.1 py36h8cadb63_0
theano 0.9.0 py36_0
tk 8.6.7 hc745277_3
tornado 4.5.2 py36h1283b2a_0
traitlets 4.3.2 py36h674d592_0
tslearn 0.1.7.2
typing 3.6.2
urllib3 1.22
urllib3 1.22 py36hbe7ace6_0
wcwidth 0.1.7 py36hdf4376a_0
webencodings 0.5.1 py36h800622e_1
werkzeug 0.12.2 py36hc703753_0
wheel 0.29.0 py36he7f4e38_1
widgetsnbextension 3.0.2 py36hd01bb71_1
xz 5.2.3 h55aa19d_2
yaml 0.1.7 h014fa73_2
zeromq 4.2.2 hbedb6e5_2
zlib 1.2.11 ha838bed_2

farfan92 commented 6 years ago

Updating packages seems to have removed the NameError. (numpy and sklearn specifically). Must have been a compatibility issue, after installing some other packages.

lmcinnes commented 6 years ago

I'm glad at least one of you got this resolved. Hopefully refreshing/updating packages might work twice? I am honestly at a little bit of a loss here.

danielhelf commented 6 years ago

Getting the exact same error message here (descriptor 'get_metric' requires a 'hdbscan.dist_metrics.DistanceMetric' object but received a 'str') despite updating the packages.

Vanwalleghem commented 6 years ago

Got the same error message on an Ubuntu virtual machine with python 2.7 and a windows PC with python 3.6.4, both running the latest version of anaconda and having installed HDBscan through conda-forge. I may try to install it another way tomorrow

Vanwalleghem commented 6 years ago

Alright, I actually had some time so I tested that. On the same machine, the pip install hdbscan worked immediately (after I removed the conda-forge version). Hope it helps you narrow it down and/or to fix it for others

linwoodc3 commented 6 years ago

I also had this error, but it was only present in the conda-forge installed version of hdbscan. pip install version of hdbscan. I removed the conda-forge version, ran pip install hdbsan for my conda environment, and hdbscan works find.

lmcinnes commented 6 years ago

@linwoodc3 That's a little weird; the conda-forge version gets synced with the pip version regularly. Perhaps a conda upgrade umap-learn would have doen the job? Regardless, you have a working version now, and that's what counts. Thanks for the report, I'll keep an eye out for something amiss like this somewhere along the line.

kevinafra commented 5 years ago

I just got this same error (descriptor 'get_metric' requires a 'hdbscan.dist_metrics.DistanceMetric' object but received a 'str'). I installed hdbscan just yesterday via pip. I did notice that when I tried to import it, it gave me an error about 'numpy.core.multiarray failed to import' but no reason why. So I imported numpy.core.multiarray manually, and then I was able to import hdbscan. Don't know whether that is a related problem. But attempting to fit some data that I had just fit with sklearn.cluster.DBSCAN failed with the above error when I tried to do it with hdbscan. I have python 2.7.13 and numpy 1.11.2. 'pip check' doesn't find any broken dependencies. What else can I try? I would really like to use hdbscan, as I have data whose clusters are certain to have variable density. Does hdbscan require python 3.x perhaps, along with all of the dependent versions of numpy, Cython, etc.?

scikit-learn-contrib / hdbscan

dist_metrics error with default settings #139

`clusterer = hdbscan.HDBSCAN(min_cluster_size=5, min_samples=None, metric='precomputed').fit(gower_df)