scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.
http://hdbscan.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
2.72k stars 491 forks source link

issue with numy 2.0 `ValueError: numpy.dtype size changed` #642

Open RogerHYang opened 2 weeks ago

RogerHYang commented 2 weeks ago

screenshot below shows the error in Colab

%pip install numpy==2.0 hdbscan scikit-learn==1.5.0

from hdbscan import HDBSCAN
Screenshot 2024-06-17 at 7 36 12 AM
lmcinnes commented 2 weeks ago

The numpy 2.0 release might mean you have to ensure cython gets run to rebuild C sources. Potentially this can be solved the --no-binary option in pip install, but I'm not entirely sure if this is exactly the issue.

lmcinnes commented 2 weeks ago

It looks like numpy 2.0 is not going to play nice with cython and hdbscan here. Figuring out how to make it all work will take some time. In the meantime I've pinned the numpy version as <2 for a new hdbscan release so installing from PyPI (soon) will object to having hdbscan 0.8.37 and numpy 2.0. For other workarounds I would suggest looking to fast_hdbcan which should work with numpy 2.0 right now.

seberg commented 2 days ago

@lmcinnes there should be nothing required except uploading no wheels. I have opened https://github.com/scikit-learn-contrib/hdbscan/pull/644

seberg commented 2 days ago

Sorry, turns out there were some minor code tweaks needed. And there is another issue, but I am not sure what it is, since it doesn't seem to be related to NumPy itself.