scikit-learn-contrib / hdbscan

A high performance implementation of HDBSCAN clustering.
http://hdbscan.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
2.82k stars 507 forks source link

issue with numy 2.0 `ValueError: numpy.dtype size changed` #642

Closed RogerHYang closed 2 months ago

RogerHYang commented 5 months ago

screenshot below shows the error in Colab

%pip install numpy==2.0 hdbscan scikit-learn==1.5.0

from hdbscan import HDBSCAN
Screenshot 2024-06-17 at 7 36 12 AM
lmcinnes commented 5 months ago

The numpy 2.0 release might mean you have to ensure cython gets run to rebuild C sources. Potentially this can be solved the --no-binary option in pip install, but I'm not entirely sure if this is exactly the issue.

lmcinnes commented 5 months ago

It looks like numpy 2.0 is not going to play nice with cython and hdbscan here. Figuring out how to make it all work will take some time. In the meantime I've pinned the numpy version as <2 for a new hdbscan release so installing from PyPI (soon) will object to having hdbscan 0.8.37 and numpy 2.0. For other workarounds I would suggest looking to fast_hdbcan which should work with numpy 2.0 right now.

seberg commented 4 months ago

@lmcinnes there should be nothing required except uploading no wheels. I have opened https://github.com/scikit-learn-contrib/hdbscan/pull/644

seberg commented 4 months ago

Sorry, turns out there were some minor code tweaks needed. And there is another issue, but I am not sure what it is, since it doesn't seem to be related to NumPy itself.

jakirkham commented 4 months ago

Thanks Sebastian and Leland! 🙏

Is there more still needed here?

Or do we just need a release and packages?