sdpython / papierstat

Lectures on Machine Learning (French)
http://www.xavierdupre.fr/app/papierstat/helpsphinx/index.html
MIT License
8 stars 2 forks source link

Issue with kmeans_constraint.py #26

Open MastafaF opened 4 years ago

MastafaF commented 4 years ago

Hi,

In src/papierstat/mltricks/kmeans_constraint.py there is an issue.

Indeed, _k_means._centers_sparse and _k_means._centers_dense now requires sample_weight parameter.

Please refer to: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cluster/_k_means_fast.pyx

where centers_sparse and centers_dense are defined.

When setting that in kmeans_constraint.py in function constraint_kmeans:

    while iter < max_iter:
        # compute new clusters
        if scipy.sparse.issparse(X):
            centers = _centers_sparse(X = X, sample_weight = sample_weight, labels = labels, n_clusters = n_clusters, distances = distances_close )
        else:
            centers = _centers_dense(X = X, sample_weight = sample_weight, labels = labels, n_clusters = n_clusters, distances = distances_close)
        # association
        _constraint_association(leftover, counters, labels, leftclose, distances_close,
                                centers, X, x_squared_norms, limit, strategy, state=state)

I end up having a Segmentation Fault: 11 error.

Any idea how to solve it? :)

Beau travail par ailleurs! 👍

sdpython commented 4 years ago

Merci, no idea on how to solve it. I would probably need a way to replicate the issue first. A script would help.