scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.
http://hdbscan.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
2.78k stars 497 forks source link

Unable to install hdbscan on colab. #600

Open Raingel opened 1 year ago

Raingel commented 1 year ago

Today I found the following error message when trying to install hdbscan on colab.

 error: subprocess-exited-with-error

  × Building wheel for hdbscan (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  Building wheel for hdbscan (pyproject.toml) ... error
  ERROR: Failed building wheel for hdbscan
Failed to build hdbscan
ERROR: Could not build wheels for hdbscan, which is required to install pyproject.toml-based projects

It worked fine when I installed it last week.

I also tried to install the previous version of hdbscan (0.8.29), but it still failed.

image

mikeldking commented 1 year ago

Seeing this on our CI builds now as well

error: subprocess-exited-with-error

  × Building wheel for hdbscan (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [[16](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:17)8 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-38
      creating build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/validity.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/plots.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/flat.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/prediction.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/hdbscan_.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/__init__.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/robust_single_linkage_.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      creating build/lib.linux-x86_64-cpython-38/hdbscan/tests
      copying hdbscan/tests/test_rsl.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
      copying hdbscan/tests/test_prediction_utils.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
      copying hdbscan/tests/test_flat.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
      copying hdbscan/tests/__init__.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
      copying hdbscan/tests/test_hdbscan.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
      running build_ext
      Compiling hdbscan/_hdbscan_tree.pyx because it changed.
      [1/1] Cythonizing hdbscan/_hdbscan_tree.pyx
      building 'hdbscan._hdbscan_tree' extension
      creating build/temp.linux-x86_64-cpython-38
      creating build/temp.linux-x86_64-cpython-38/hdbscan
      gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/runner/.local/share/hatch/env/virtual/arize-phoenix/C8K4HrkP/type/include -I/opt/hostedtoolcache/Python/3.8.[17](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:18)/x64/include/python3.8 -I/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include -c hdbscan/_hdbscan_tree.c -o build/temp.linux-x86_64-cpython-38/hdbscan/_hdbscan_tree.o
      In file included from /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:[18](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:19)30,
                       from /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
                       from /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                       from hdbscan/_hdbscan_tree.c:1097:
      /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
         17 | #warning "Using deprecated NumPy API, disable it with " \
            |  ^~~~~~~
      gcc -shared -Wl,--rpath=/opt/hostedtoolcache/Python/3.8.17/x64/lib -Wl,--rpath=/opt/hostedtoolcache/Python/3.8.17/x64/lib build/temp.linux-x86_64-cpython-38/hdbscan/_hdbscan_tree.o -L/opt/hostedtoolcache/Python/3.8.17/x64/lib -o build/lib.linux-x86_64-cpython-38/hdbscan/_hdbscan_tree.cpython-38-x86_64-linux-gnu.so
      /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Compiler/Main.py:381: FutureWarning: Cython directive 'language_level' not set, using '3str' for now (Py3). This has changed from earlier releases! File: /tmp/pip-install-sir9k2dg/hdbscan_aa682700701c41ffa445f31aed278805/hdbscan/_hdbscan_tree.pyx
        tree = Parsing.p_module(s, pxd, full_module_name)
      /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Compiler/Main.py:381: FutureWarning: Cython directive 'language_level' not set, using '3str' for now (Py3). This has changed from earlier releases! File: /tmp/pip-install-sir9k2dg/hdbscan_aa682700701c41ffa445f31aed278805/hdbscan/_hdbscan_linkage.pyx
        tree = Parsing.p_module(s, pxd, full_module_name)

      Error compiling Cython file:
      ------------------------------------------------------------
      ...
      import numpy as np
      cimport numpy as np

      from libc.float cimport DBL_MAX

      from dist_metrics cimport DistanceMetric
      ^
      ------------------------------------------------------------

      hdbscan/_hdbscan_linkage.pyx:12:0: 'dist_metrics.pxd' not found

      Error compiling Cython file:
      ------------------------------------------------------------
      ...
      import numpy as np
      cimport numpy as np

      from libc.float cimport DBL_MAX

      from dist_metrics cimport DistanceMetric
      ^
      ------------------------------------------------------------

      hdbscan/_hdbscan_linkage.pyx:12:0: 'dist_metrics/DistanceMetric.pxd' not found

      Error compiling Cython file:
      ------------------------------------------------------------
      ...

      cpdef np.ndarray[np.double_t, ndim=2] mst_linkage_core_vector(
              np.ndarray[np.double_t, ndim=2, mode='c'] raw_data,
              np.ndarray[np.double_t, ndim=1, mode='c'] core_distances,
              DistanceMetric dist_metric,
              ^
      ------------------------------------------------------------

      hdbscan/_hdbscan_linkage.pyx:58:8: 'DistanceMetric' is not a type identifier

      Error compiling Cython file:
      ------------------------------------------------------------
      ...
                      continue

                  right_value = current_distances[j]
                  right_source = current_sources[j]

                  left_value = dist_metric.dist(&raw_data_ptr[num_features *
                                                ^
      ------------------------------------------------------------

      hdbscan/_hdbscan_linkage.pyx:129:42: Cannot convert 'double_t *' to Python object

      Error compiling Cython file:
      ------------------------------------------------------------
      ...
                  right_value = current_distances[j]
                  right_source = current_sources[j]

                  left_value = dist_metric.dist(&raw_data_ptr[num_features *
                                                              current_node],
                                                &raw_data_ptr[num_features * j],
                                                ^
      ------------------------------------------------------------

      hdbscan/_hdbscan_linkage.pyx:131:42: Cannot convert 'double_t *' to Python object
      Compiling hdbscan/_hdbscan_linkage.pyx because it changed.
      [1/1] Cythonizing hdbscan/_hdbscan_linkage.pyx
      Traceback (most recent call last):
        File "/home/runner/.local/share/hatch/env/virtual/arize-phoenix/C8K4HrkP/type/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/home/runner/.local/share/hatch/env/virtual/arize-phoenix/C8K4HrkP/type/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/home/runner/.local/share/hatch/env/virtual/arize-phoenix/C8K4HrkP/type/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 416, in build_wheel
          return self._build_with_temp_dir(['bdist_wheel'], '.whl',
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 401, in _build_with_temp_dir
          self.run_setup()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 487, in run_setup
          super(_BuildMetaLegacyBackend,
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 338, in run_setup
          exec(code, locals())
        File "<string>", line 96, in <module>
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/__init__.py", line 107, in setup
          return distutils.core.setup(**attrs)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line [20](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:21)1, in run_commands
          dist.run_commands()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 343, in run
          self.run_command("build")
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build.py", line 131, in run
          self.run_command(cmd_name)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "<string>", line 26, in run
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
          self.build_extensions()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
          self._build_extensions_serial()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
          self.build_extension(ext)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Distutils/build_ext.py", line 1[22](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:23), in build_extension
          new_ext = cythonize(
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Build/Dependencies.py", line 1134, in cythonize
          cythonize_one(*args)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Build/Dependencies.py", line 1[30](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:31)1, in cythonize_one
          raise CompileError(None, pyx_file)
      Cython.Compiler.Errors.CompileError: hdbscan/_hdbscan_linkage.pyx
      [end of output]
fvdnabee commented 1 year ago

We're seeming the same issue since today, on linux x86-64 py3.10. I noticed there weren't any wheels before, so I'm assuming we were building hdbscan from source before. Not sure what change now causes the build failure.

MrBeeMovie commented 1 year ago

Having this problem as well. Installing using poetry. No changes to lock file. Was working last week.

Rhaedonius commented 1 year ago

This is also creating issue in Databricks as well. Cython released a new major version (3.0.0) a few hours ago, so there might be an issue with that on these managed enviroments. https://pypi.org/project/Cython/#history I tried installing the package from master on wsl and it worked with all python versions > 3.8 using the newest cython. Anyway, it might be worth to pin all requirements to be less than the next major version just to be on the safe side.

EDIT: databricks runtime 10.4 LTS has issues, 11.3 LTS and 12.2 LTS work fine.

kikefdezl commented 1 year ago

Downgrading Cython to previous release is not working for me. Still same error.

mikeldking commented 1 year ago

Same, colab doesn't have cython3 for me anyways

Screenshot 2023-07-17 at 11 10 54 AM
Rhaedonius commented 1 year ago

I suggested cython only because of the timing of them releasing a new major version and the errors popping up. it might not be related.

argonaut76 commented 1 year ago

Downgrading Cython to 0.29.36 is also not working for me.

dafajon commented 1 year ago

Having the same issue on Kaggle notebooks.

lmcinnes commented 1 year ago

There was a recent sklearn release that changed some internals the hdbscan relied on (which resulted in the 0.8.30 release to attempt to fix those). It's possible that this is the issue; Can you check what sklearn version you have?

argonaut76 commented 1 year ago

scikit-learn==1.2.2

kenho211 commented 1 year ago

Same issue on ubuntu 18.04 using docker image python:3.8.12

lmcinnes commented 1 year ago

I'm at a bit of a loss; especially if 0.8.29 is also not building anymore. I can at least reproduce this locally, but it is unclear how to fix things since nothing that is currently breaking has changed in quite some time -- so it isn't clear why it is breaking at all.

lmcinnes commented 1 year ago

Okay, I poked the obvious things in terms of module name resolution issues and it seems to have fixed the problem locally. I don't understand what changed, or, indeed, why this particular change is now required, but given the scale of issues people are having I'm going to push those changes out as a 0.8.31 release and hopefully that solves the problems for some people.

nchepanov commented 1 year ago

I have an idea. This might be caused by isolated builds. When I install the package it pulls down the most recent version of Cython (regardless of what's installed in my environment).

cython>=0.27 should be updated to be cython>=0.27<3 to prevent latest version of Cython

(comment is being updated as I'm testing my hypothesis...)

thomasjv799 commented 1 year ago

The new patch kindof solved the issue for me. https://github.com/scikit-learn-contrib/hdbscan/releases/tag/0.8.31

mikeldking commented 1 year ago

Confirming working on 0.8.31 for me too - which makes me believe @nchepanov 's comment makes sense (e.g. the general release of cython 3.0 caused the break). Aligns timing wise too.

lmcinnes commented 1 year ago

@nchepanov I believe you are correct; while the changes made allowed Cython 3 to build hdbscan, there seem to be further issues at runtime. Until I have time to figure out and work through all the changes that Cython 3 requires I have added a "<3" requirement for Cython. That seems to resolve all the issues as far as I can tell. I've pushed that out as 0.8.32 and hopefully that can keep things afloat for a while.

Thanks to everyone for flagging the issue and the help tracking down the source of the problem.

Rhaedonius commented 1 year ago

I have an idea. This might be caused by isolated builds. When I install the package it pulls down the most recent version of Cython (regardless of what's installed in my environment).

cython>=0.27 should be updated to be cython>=0.27<3 to prevent latest version of Cython

(comment is being updated as I'm testing my hypothesis...)

This is more what i was thinking, I did remember something about isolated builds but could not locate it in the python docs. I changed the build requirement to "cython<3" in pyproject.toml and managed to build the code for hdbscan 0.8.30 under databricks 10.4 LTS and colab. The cython in requirements might not be needed, as it's not a runtime requirement (still testing for this)

aaron-skydio commented 1 year ago

I did remember something about isolated builds but could not locate it in the python docs.

pip does not respect installed versions of packages in build-system.requires for PEP517 packages - yeah you can also work around this by installing a working version of cython (2.x), and passing --no-build-isolation to pip install, which will stop it from installing a newer version of cython (3.x) just for the wheel build

argonaut76 commented 1 year ago

0.83.31 is not working for me. I'm running hdbscan inside a dockerized application, and getting the following error:

`Traceback (most recent call last):

File "/usr/src/app/modules/cluster.py", line 26, in fit clusterer = HDBSCAN(min_cluster_size=min_cluster_size, min_samples=self.min_samples, cluster_selection_method=self.cluster_selection_method).fit(vectors)

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 1205, in fit ) = hdbscan(clean_data, **kwargs)

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 884, in hdbscan _tree_to_labels(

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 78, in _tree_to_labels condensed_tree = condense_tree(single_linkage_tree, min_cluster_size)

File "hdbscan/_hdbscan_tree.pyx", line 43, in hdbscan._hdbscan_tree.condense_tree

File "hdbscan/_hdbscan_tree.pyx", line 114, in hdbscan._hdbscan_tree.condense_tree

TypeError: 'numpy.float64' object cannot be interpreted as an integer`

I'm using scikit-learn==1.2.2.

mikeldking commented 1 year ago

0.83.31 is not working for me. I'm running hdbscan inside a dockerized application, and getting the following error:

`Traceback (most recent call last):

File "/usr/src/app/modules/cluster.py", line 26, in fit clusterer = HDBSCAN(min_cluster_size=min_cluster_size, min_samples=self.min_samples, cluster_selection_method=self.cluster_selection_method).fit(vectors)

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 1205, in fit ) = hdbscan(clean_data, **kwargs)

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 884, in hdbscan _tree_to_labels(

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 78, in _tree_to_labels condensed_tree = condense_tree(single_linkage_tree, min_cluster_size)

File "hdbscan/_hdbscan_tree.pyx", line 43, in hdbscan._hdbscan_tree.condense_tree

File "hdbscan/_hdbscan_tree.pyx", line 114, in hdbscan._hdbscan_tree.condense_tree

TypeError: 'numpy.float64' object cannot be interpreted as an integer`

I'm using scikit-learn==1.2.2.

This is alsow unfortunately the same runtime exception I'm hitting with 0.83.32

lmcinnes commented 1 year ago

So I definitely saw that runtime error with 0.8.31; in testing that disappeared with 0.8.32. If it is still an issue in 0.8.32 then that's not so good. I was getting all green on the test suite: https://dev.azure.com/lelandmcinnes/HDBSCAN%20builds/_build/results?buildId=901&view=results so I'm not sure what the lingering issue is. Perhaps a clean re-install for 0.8.32?

argonaut76 commented 1 year ago

My application is running in docker. I did a clean rebuild and am still getting the error

File "hdbscan/_hdbscan_tree.pyx", line 114, in hdbscan._hdbscan_tree.condense_tree TypeError: 'numpy.float64' object cannot be interpreted as an integer

I also tried using the most recent scikit-learn release, to no effect.

lmcinnes commented 1 year ago

@argonaut76 I'm loathe to just pushing new releases; 0.8.32 seems good on a swathe of platforms. I've pushed some changes to master, however, that may fix your problems. Can you install from github within your docker build and see if those resolve your issues?

argonaut76 commented 1 year ago

@lmcinnes sure, I'll give that a shot.

If it helps, here's some additional context:

As a base docker image I'm using ubuntu:latest, which is Ubuntu 22.04. I have the following dependencies installed that could affect hdbscan:

Cython==0.29.36 hdbscan==0.8.32 joblib==1.3.1 numpy==1.25.1 scikit-learn==1.3.0 scipy==1.11.1 spacy==3.5.3

argonaut76 commented 1 year ago

@lmcinnes that worked!

lmcinnes commented 1 year ago

I'll give it a little while to ensure that the current master works for most people, and then try to push out a 0.8.33 that will hopefully get us over this little hurdle late in the week.

argonaut76 commented 1 year ago

Thanks. What an odd problem.

setu4993 commented 1 year ago

@lmcinnes : Also hitting the same issue with numpy.float64 as @argonaut76.

If possible for you, would love a v0.8.33 release soon instead of later in the week!

tjallard commented 1 year ago

I installed direct from github in Colab and it worked. Thanks!

image

Cezary-Kuik commented 1 year ago

@lmcinnes : Also hitting the same issue with numpy.float64 as @argonaut76.

If possible for you, would love a v0.8.33 release soon instead of later in the week!

Me too https://github.com/MaartenGr/BERTopic/issues/1412

MartinKlefas commented 1 year ago

0.83.31 is not working for me. I'm running hdbscan inside a dockerized application, and getting the following error: Traceback (most recent call last): File "/usr/src/app/modules/cluster.py", line 26, in fit clusterer = HDBSCAN(min_cluster_size=min_cluster_size, min_samples=self.min_samples, cluster_selection_method=self.cluster_selection_method).fit(vectors) File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 1205, in fit ) = hdbscan(clean_data, **kwargs) File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 884, in hdbscan _tree_to_labels( File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 78, in _tree_to_labels condensed_tree = condense_tree(single_linkage_tree, min_cluster_size) File "hdbscan/_hdbscan_tree.pyx", line 43, in hdbscan._hdbscan_tree.condense_tree File "hdbscan/_hdbscan_tree.pyx", line 114, in hdbscan._hdbscan_tree.condense_tree TypeError: 'numpy.float64' object cannot be interpreted as an integer I'm using scikit-learn==1.2.2.

This is alsow unfortunately the same runtime exception I'm hitting with 0.83.32

I've just installed 8.32 on a local Kubeflow server and am getting this float64 error too. Thought I was going mad as it was working locally with hdbscan that I installed a couple of months ago!

ookjosh commented 1 year ago

I installed direct from github in Colab and it worked. Thanks!

image

I was also able to pip install directly from master and it fixed the issue for me. Thank you for the quick fix!

Context in case it's helpful: Pop_OS 22.04, Python 3.10.6 w/ venv. I came across this issue while trying out the example here

msaoudallah commented 1 year ago

i had a similar issue uon ubuntu 23.04 using docker image python:3.8.12. solution was to use hdbscan 0.8.31

MartinKlefas commented 1 year ago

I've used: pip install git+https://github.com/scikit-learn-contrib/hdbscan.git

which installed: Resolved https://github.com/scikit-learn-contrib/hdbscan.git to commit 7611cfed22645187f1ae1dfbd7345d9a96b830ef

that seems to be the commit that should fix this int casting, but I still get the error:

File ~/.local/lib/python3.10/site-packages/hdbscan/hdbscan_.py:80, in _tree_to_labels(X, single_linkage_tree, min_cluster_size, cluster_selection_method, allow_single_cluster, match_reference_implementation, cluster_selection_epsilon, max_cluster_size)
     78 print(type(single_linkage_tree))
     79 print(type(min_cluster_size))
---> 80 condensed_tree = condense_tree(single_linkage_tree, min_cluster_size)
     81 stability_dict = compute_stability(condensed_tree)
     82 labels, probabilities, stabilities = get_clusters(
     83     condensed_tree,
     84     stability_dict,
   (...)
     89     max_cluster_size,
     90 )

`File hdbscan/_hdbscan_tree.pyx:43, in hdbscan._hdbscan_tree.condense_tree()

File hdbscan/_hdbscan_tree.pyx:114, in hdbscan._hdbscan_tree.condense_tree()

TypeError: 'numpy.float64' object cannot be interpreted as an integer`

If I change hdbscan_.py to cast that tree as integers then I get errors further down the line, and I can't seem to find the file _hdbscan_tree.pyx to see what's going on there... I get the same error in 8.30, 8.31 & 8.32 from pip earlier versions won't install because of cython errors

ehoelzl commented 1 year ago

I can also confirm that:

mdolr commented 1 year ago

Experiencing TypeError: 'numpy.float64' object cannot be interpreted as an integer on 0.8.32 as well

ssaee79 commented 1 year ago

I'm running on Colab, also having the same issue with numpy.float64. I've tried to restart runtime on Colab, also not working at all.

Yesterday, all were worked well for me :(

Any solution to this?

hbeelee commented 1 year ago

Unfortunately still experiencing the same issue with hdbscan 0.8.32 on google colab. I used !pip install git+https://github.com/scikit-learn-contrib/hdbscan.git.

ssaee79 commented 1 year ago

is that working? using: !pip install git+https://github.com/scikit-learn-contrib/hdbscan.git.

sak96 commented 1 year ago

I had issue with older version of package 0.8.28.

But seems like it is caused as during building hdbscan the Cython used is > 3. The fix can be installing the Cypython then following it up with installation of hdbscan.

Note sure if there is a way to en-corporate this in requirements.txt I have also added numpy as it is required.

pip3 install --user --no-warn-script-location --disable-pip-version-check Cython==0.29.32 nympy==1.23.4
pip3 install --user --no-warn-script-location --disable-pip-version-check --no-build-isolation hbscan==0.8.23
Rhaedonius commented 1 year ago

Installing from master is working for me on colab.

image

All tags before that when built raise the numpy.float error during tests

>   union_find.union_(<np.intp_t> row[0], cluster)
E   TypeError: 'numpy.float64' object cannot be interpreted as an integer

hdbscan/_hdbscan_tree.pyx:391: TypeError

EDIT: applying the change in commit 7611cfe fixed the types issue

negarmaleki96 commented 1 year ago

The same error on Colab:

hdbscan/_hdbscan_tree.pyx in hdbscan._hdbscan_tree.condense_tree() hdbscan/_hdbscan_tree.pyx in hdbscan._hdbscan_tree.condense_tree() TypeError: 'numpy.float64' object cannot be interpreted as an integer

Any solution?

bamdadfr commented 1 year ago

I got rid of the error by installing the following versions. Thanks @sak96's for inspiration.

!pip3 install --user --no-warn-script-location --disable-pip-version-check Cython==0.29.34 numpy==1.23.5
!pip3 install --user --no-warn-script-location --disable-pip-version-check --no-build-isolation hdbscan==0.8.29
Kcnarf commented 1 year ago

pip install git+https://github.com/scikit-learn-contrib/hdbscan.git works fine in development environment 👍 :

Capture d’écran 2023-07-18 à 16 05 53

More context :

  1. Installation of packages works fine, but got the TypeError: 'numpy.float64' object cannot be interpreted as an integer error at run time due to the use of hdbscan 0.8.32
  2. It appears that my colleague installed these packages just yesterday, with the only difference of using hdbscan v0.8.30 and he hasn't any runtime issue
  3. So, I try to downgrade hdbscan to v0.8.30, but got the Cython.Compiler.Errors.CompileError: hdbscan/_hdbscan_linkage.pyx error during installation/compilation
  4. So, install hdbscan from github repository, and it works fine : install works fine, runtime execution works fine.

Will wait for a hdbscan fix (e.g. v0.8.33) for production deployment.

Siltyx commented 1 year ago

pip install git+https://github.com/scikit-learn-contrib/hdbscan.git fixed it for me as well. WSL2, Ubuntu-22.04, Python 3.10.6

mikeldking commented 1 year ago

Installing from git works for me too - but unfortunately it's a tough solution as we actually publish to pypi with HDBSCAN as a dependency and direct dependancies are not supported for this case: https://peps.python.org/pep-0440/#direct-references

The trouble here is that this is broken in colab, which is one of our primary platforms. It also affects all versions of HDBSCAN so I don't have a downgrade solution to give.

@lmcinnes totally get your hesitation of releasing a new version - but would love to know if we could do it sooner rather than later given the testimony above. Will really help us triage the duplicate bug reports 😅

At any rate, really appreciate you hopping on and fixing - The cython 3 release does not feel like it was well orchestrated.

mikeldking commented 1 year ago

I'm also happy to sit with a release candidate for a few days until the final fix!

lmcinnes commented 1 year ago

There seems to be a fair amount of positivity to the last round of fixes working, and only a few people still reporting issues, so I think I'll push the release to sooner instead of later.