scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.
http://hdbscan.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
2.75k stars 495 forks source link

Tests fail #238

Open robguinness opened 5 years ago

robguinness commented 5 years ago

Hi, I'm a new user. Just installed and ran the tests, and one is failing, as well as one error:


======================================================================
ERROR: hdbscan.tests.test_rsl.test_rsl_high_dimensional
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/hdbscan/tests/test_rsl.py", line 131, in test_rsl_high_dimensional
    V=np.ones(H.shape[1])).fit(H).labels_
TypeError: __init__() got an unexpected keyword argument 'V'

======================================================================
FAIL: hdbscan.tests.test_rsl.test_rsl_is_sklearn_estimator
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/hdbscan/tests/test_rsl.py", line 202, in test_rsl_is_sklearn_estimator
    check_estimator(RobustSingleLinkage)
  File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/sklearn/utils/estimator_checks.py", line 255, in check_estimator
    check_parameters_default_constructible(name, Estimator)
  File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/sklearn/utils/estimator_checks.py", line 1608, in check_parameters_default_constructible
    np.float64, types.FunctionType, Memory])
AssertionError: <class 'dict'> not found in [<class 'str'>, <class 'int'>, <class 'float'>, <class 'bool'>, <class 'tuple'>, <class 'NoneType'>, <class 'numpy.float64'>, <class 'function'>, <class 'sklearn.externals.joblib.memory.Memory'>]

----------------------------------------------------------------------

I installed using pip, and I tested this on both Python 3.5.2 and Python 3.4.3.

lmcinnes commented 5 years ago

Sorry, I'm in the process of fixing these -- neither is serious, and shouldn't impact usage right now, but are things I would like to get resolved at some point. I'm short on time for this project at the moment so it may be a little while before these get properly resolved, particularly because they don't overly impact users.

On Fri, Sep 14, 2018 at 6:14 AM Rob Guinness notifications@github.com wrote:

Hi, I'm a new user. Just installed and ran the tests, and one is failing, as well as one error: .........E.F ERROR: hdbscan.tests.test_rsl.test_rsl_high_dimensional

Traceback (most recent call last): File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/nose/case.py", line 198, in runTest self.test(self.arg) File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/hdbscan/tests/test_rsl.py", line 131, in test_rsl_highdimensional V=np.ones(H.shape[1])).fit(H).labels TypeError: init*() got an unexpected keyword argument 'V'

FAIL: hdbscan.tests.test_rsl.test_rsl_is_sklearn_estimator

Traceback (most recent call last): File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/nose/case.py", line 198, in runTest self.test(*self.arg) File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/hdbscan/tests/test_rsl.py", line 202, in test_rsl_is_sklearn_estimator check_estimator(RobustSingleLinkage) File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/sklearn/utils/estimator_checks.py", line 255, in check_estimator check_parameters_default_constructible(name, Estimator) File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/sklearn/utils/estimator_checks.py", line 1608, in check_parameters_default_constructible np.float64, types.FunctionType, Memory]) AssertionError: <class 'dict'> not found in [<class 'str'>, <class 'int'>, <class 'float'>, <class 'bool'>, <class 'tuple'>, <class 'NoneType'>, <class 'numpy.float64'>, <class 'function'>, <class 'sklearn.externals.joblib.memory.Memory'>]

I installed using pip, and I tested this on both Python 3.5.2 and Python 3.4.3.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn-contrib/hdbscan/issues/238, or mute the thread https://github.com/notifications/unsubscribe-auth/ALaKBVeceGcCLV6wAGhS1YQxOTcQODrcks5ua4F4gaJpZM4WpAKT .

robguinness commented 5 years ago

Ok, thanks for the update.

KarthikMgk commented 5 years ago

Thanks for the update. I was about to report the issue

Sorry, I'm in the process of fixing these -- neither is serious, and shouldn't impact usage right now, but are things I would like to get resolved at some point. I'm short on time for this project at the moment so it may be a little while before these get properly resolved, particularly because they don't overly impact users. On Fri, Sep 14, 2018 at 6:14 AM Rob Guinness @.**> wrote: Hi, I'm a new user. Just installed and ran the tests, and one is failing, as well as one error: .........E.F ERROR: hdbscan.tests.test_rsl.test_rsl_high_dimensional Traceback (most recent call last): File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/nose/case.py", line 198, in runTest self.test(self.arg) File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/hdbscan/tests/test_rsl.py", line 131, in test_rsl_highdimensional V=np.ones(H.shape[1])).fit(H).labels TypeError: init() got an unexpected keyword argument 'V' ====================================================================== FAIL: hdbscan.tests.test_rsl.test_rsl_is_sklearn_estimator Traceback (most recent call last): File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/nose/case.py", line 198, in runTest self.test(*self.arg) File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/hdbscan/tests/test_rsl.py", line 202, in test_rsl_is_sklearn_estimator check_estimator(RobustSingleLinkage) File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/sklearn/utils/estimator_checks.py", line 255, in check_estimator check_parameters_default_constructible(name, Estimator) File "/data/r_and_d/dochier/.venv/lib/python3.5/site-packages/sklearn/utils/estimator_checks.py", line 1608, in check_parameters_default_constructible np.float64, types.FunctionType, Memory]) AssertionError: <class 'dict'> not found in [<class 'str'>, <class 'int'>, <class 'float'>, <class 'bool'>, <class 'tuple'>, <class 'NoneType'>, <class 'numpy.float64'>, <class 'function'>, <class 'sklearn.externals.joblib.memory.Memory'>] ------------------------------ I installed using pip, and I tested this on both Python 3.5.2 and Python 3.4.3. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#238>, or mute the thread https://github.com/notifications/unsubscribe-auth/ALaKBVeceGcCLV6wAGhS1YQxOTcQODrcks5ua4F4gaJpZM4WpAKT .

jstiefel commented 5 years ago

In addition to the exact error and fail mentioned above I'm getting this one:

FAIL: hdbscan.tests.test_hdbscan.test_hdbscan_is_sklearn_estimator
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/julian/segmap_python/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/julian/segmap_python/lib/python3.5/site-packages/hdbscan/tests/test_hdbscan.py", line 574, in test_hdbscan_is_sklearn_estimator
    check_estimator(HDBSCAN)
  File "/home/julian/segmap_python/lib/python3.5/site-packages/sklearn/utils/estimator_checks.py", line 296, in check_estimator
    check_no_attributes_set_in_init(name, estimator)
  File "/home/julian/segmap_python/lib/python3.5/site-packages/sklearn/utils/testing.py", line 348, in wrapper
    return fn(*args, **kwargs)
  File "/home/julian/segmap_python/lib/python3.5/site-packages/sklearn/utils/estimator_checks.py", line 1996, in check_no_attributes_set_in_init
    % (name, sorted(invalid_attr)))
AssertionError: {'_metric_kwargs', '_relative_validity', '_outlier_scores', '_single_linkage_tree', '_raw_data', '_min_spanning_tree', '_prediction_data', '_condensed_tree'} is not false : Estimator HDBSCAN should not set any attribute apart from parameters during init. Found attributes ['_condensed_tree', '_metric_kwargs', '_min_spanning_tree', '_outlier_scores', '_prediction_data', '_raw_data', '_relative_validity', '_single_linkage_tree'].

In total 5 skips, 1 error, 2 fails. Using Python 3.5.2 Is this still the same issue as mentioned above or does it impact usage?

Thanks.

lmcinnes commented 5 years ago

Yes, same issue. I will try to get time to work on this eventually, but a number of other matters are more pressing for me right now.

klcooksey commented 5 years ago

I don't want to double up on the tickets so I'll add what I would have reported here. Running on Mac OSX 10.10.5 (Yosemite).

$ python -m nose -s hdbscan .../opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/hdbscan/hdbscan.py:216: UserWarning: Cannot generate Minimum Spanning Tree; the implemented Prim's does not produce the full minimum spanning tree 'the full minimum spanning tree ', UserWarning) ./opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/hdbscan/hdbscan.py:252: UserWarning: Cannot generate Minimum Spanning Tree; the implemented Prim's does not produce the full minimum spanning tree 'the full minimum spanning tree ', UserWarning) ................SS................F

FAIL: hdbscan.tests.test_rsl.test_rsl_is_sklearn_estimator

Traceback (most recent call last): File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/hdbscan/tests/test_rsl.py", line 202, in test_rsl_is_sklearn_estimator check_estimator(RobustSingleLinkage) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/utils/estimator_checks.py", line 295, in check_estimator check_parameters_default_constructible(name, Estimator) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/utils/estimator_checks.py", line 2114, in check_parameters_default_constructible np.float64, types.FunctionType, Memory]) AssertionError: <class 'dict'> not found in [<class 'str'>, <class 'int'>, <class 'float'>, <class 'bool'>, <class 'tuple'>, <class 'NoneType'>, <class 'numpy.float64'>, <class 'function'>, <class 'sklearn.externals.joblib.memory.Memory'>]


Ran 39 tests in 10.933s

FAILED (SKIP=2, failures=1)

$ pip --version pip 19.0.3 from /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pip (python 3.7)

$ python --version Python 3.7.2

I was having issues install with: sudo -H pip install hdbscan so used: sudo -H pip install --upgrade git+https://github.com/scikit-learn-contrib/hdbscan.git#egg=hdbscan.

lmcinnes commented 5 years ago

Hopefully this 0.8.20 release will fix these lingering issues for now.

klcooksey commented 5 years ago

Unfortunately not because I reinstalled and tried again:

$ pip show hdbscan Name: hdbscan Version: 0.8.20 Summary: Clustering based on density with variable density clusters Home-page: http://github.com/scikit-learn-contrib/hdbscan Author: None Author-email: None License: BSD Location: /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages Requires: cython, scipy, scikit-learn, numpy Required-by:

$ python -m nose -s hdbscan .../opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/hdbscan/hdbscan.py:216: UserWarning: Cannot generate Minimum Spanning Tree; the implemented Prim's does not produce the full minimum spanning tree 'the full minimum spanning tree ', UserWarning) ./opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/hdbscan/hdbscan.py:252: UserWarning: Cannot generate Minimum Spanning Tree; the implemented Prim's does not produce the full minimum spanning tree 'the full minimum spanning tree ', UserWarning) ................SS......E

ERROR: Failure: NameError (name 'SkipTest' is not defined)

Traceback (most recent call last): File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nose/failure.py", line 39, in runTest raise self.exc_val.with_traceback(self.tb) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nose/loader.py", line 417, in loadTestsFromName addr.filename, addr.module) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nose/importer.py", line 47, in importFromPath return self.importFromDir(dir_path, fqname) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nose/importer.py", line 94, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/imp.py", line 234, in load_module return load_source(name, filename, file) File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/imp.py", line 171, in load_source module = _load(spec) File "", line 696, in _load File "", line 677, in _load_unlocked File "", line 728, in exec_module File "", line 219, in _call_with_frames_removed File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/hdbscan/tests/test_rsl.py", line 201, in @SkipTest NameError: name 'SkipTest' is not defined


Ran 29 tests in 11.571s

FAILED (SKIP=2, errors=1)

lmcinnes commented 5 years ago

Thanks @klcooksey , it seems I was a little hasty. Is there any chance you can pull from master, build and test if that works?

klcooksey commented 5 years ago

You're welcome; thank you for working on it, @lmcinnes. Uh, I don't actually know how to pull from master via pip. But I did do the following to get more success:

$ sudo -H pip install --upgrade git+https://github.com/scikit-learn-contrib/hdbscan.git

$ pip show hdbscan Name: hdbscan Version: 0.8.20 Summary: Clustering based on density with variable density clusters Home-page: http://github.com/scikit-learn-contrib/hdbscan Author: None Author-email: None License: BSD Location: /opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages Requires: scipy, scikit-learn, numpy, cython Required-by:

$ python -m nose -s hdbscan .../opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/hdbscan/hdbscan.py:216: UserWarning: Cannot generate Minimum Spanning Tree; the implemented Prim's does not produce the full minimum spanning tree 'the full minimum spanning tree ', UserWarning) ./opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/hdbscan/hdbscan.py:252: UserWarning: Cannot generate Minimum Spanning Tree; the implemented Prim's does not produce the full minimum spanning tree 'the full minimum spanning tree ', UserWarning) ................SS................

Ran 38 tests in 10.774s

OK (SKIP=2)

lmcinnes commented 5 years ago

That's okay, I'll see if I can roll a few more minor patches in and get a release out. In practice you can ignore this test failure -- it is unimportant, and shouldn't effect your work in any way.

klcooksey commented 5 years ago

I've never reported a bug before (usually opting to debug myself, ignore, or find another option) but you made this a good experience, @lmcinnes. I'm looking forward to seeing if hdbscan is a good option for my research (clustering analysis of gaseous systems around galaxies).

lmcinnes commented 5 years ago

@klcooksey: Thanks! I'm glad to be able to help. If you have any further issues (including usage questions) don't hesitate to file an issue here. I'm not always super-fast to respond, but I try to do the best I can.

ackorchmaros commented 3 years ago

I tried klcooksey's suggestion but it didn't work for me

djaym7 commented 2 years ago

Cannot generate Minimum Spanning Tree; the implemented Prim's does not produce the full minimum spanning tree .. cant get dbcv score

/home/ec2-user/anaconda3/envs/pytorch_latestp36/lib/python3.6/site-packages/hdbscan/hdbscan.py:219: UserWarning: Cannot generate Minimum Spanning Tree; the implemented Prim's does not produce the full minimum spanning tree 'the full minimum spanning tree ', UserWarning)