jasonlaska / spherecluster

Clustering routines for the unit sphere
https://medium.com/@jaska_at_clara/simple-datetime-disambiguation-fd2374ce664a
MIT License
333 stars 78 forks source link

pip package not up to date? #29

Open mbahri opened 4 years ago

mbahri commented 4 years ago

Hi

First of all, thank you for sharing this package!

I'm installing spherecluster with pip and had to manually edit spherical_kmeans.py to fix the import of _k_means (I changed it to from sklearn.cluster import _k_means_fast as _k_means).

I can see this change has already been made in the repo.

Maybe the pip package isn't up to date?

Best Mehdi

lize1803 commented 4 years ago

thank you, it fix my bugs

hexsix commented 4 years ago

I changed from sklearn.cluster import _k_means to from sklearn.cluster import _k_means_fast as _k_means but got another exception

conda/pip install scikit-learn==0.20.0 may be a better way if the pip package is not up to date

check this issue for more details https://github.com/jasonlaska/spherecluster/issues/26

AlexanderJuestel commented 4 years ago

Yeah, similar issue here. The package sklearn.externals.joblib was removed from sklearn in version 0.23 so I have to downgrade to be able to use spherecluster again.

AndrewAnnex commented 4 years ago

@jasonlaska @dominikstrb @cv3d Thanks again for providing this package, but are you interested in additional help in maintaining this project? If additional people could be added to the repo and the pypi account other members of the community can help fix some of these minor incompatibilities

dominikstrb commented 4 years ago

Hi @AndrewAnnex I only fixed some imports a couple of months ago, as far as I can remember. If there is anything I can do to help, I am happy to!

AndrewAnnex commented 4 years ago

hey @dominikstrb, I think the most pressing thing would be to fix any remaining issues spherecluster running against newer releases of sklearn and cut a new release for pypi to fix existing issues. I am happy to contribute a conda-forge recipe for spherecluster once that is setup.

dominikstrb commented 4 years ago

The biggest problem is that the implementation of k-means was changed in sklearn 0.23.2: https://github.com/scikit-learn/scikit-learn/pull/11950

The internals of the implementation are now very different and it does seem like more than a minor fix in the spherecluster codebase is necessary to accomodate these changes, since spherecluster does not only use the public sklearn api, but the internals of the k-means implementation including functions like _centers_dense, which do not exist anymore since the commit I referenced.

I'm not familiar enough with the current sklearn k-means implementation, but maybe @jeremiedbb, who did that overhaul of k-means can help us.

jeremiedbb commented 4 years ago

I'm not sure I can help you. Basically we can't guarantee that the private api won't change in the future. I'd advise you to only rely on the public api or reimplement all the private parts of the api you need.

In your case it looks like a huge amount of stuff to reimplement while you're just adding a single line in kmeans_single_lloyd.

As a temporary fix you can pin the scikit-learn version but it's not a good solution in the longterm.

Sorry I can't be more helpful