sknetwork-team / scikit-network

Graph Algorithms
Other
602 stars 67 forks source link

Louvain algorithm issue with louvain_core.pyx? ValueError: Buffer dtype mismatch, expected 'int' but got 'long' #574

Closed salman-moh closed 1 month ago

salman-moh commented 3 months ago

Description

Describe what you were trying to get done. Tell us what happened, what went wrong, and what you expected to happen.

Trying to cluster using Louvain algorithm, expected to simply fit_predict but getting

What I Did

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.

Basically trying to create a csr_matrix, convert to an undirected graph and fit_predict() on Louvain as shown below:

            from sknetwork.clustering import Louvain
            from sknetwork.utils import directed2undirected
            louvain = Louvain(**self.alg_params)
            graph = csr_matrix((weight, (source, destination)), shape=(X.shape[0], X.shape[0]))
            logger.info(f'Initial csr.data dtype: {graph.data.dtype}')
            logger.info(f'Initial csr.indices dtype: {graph.indices.dtype}')
            logger.info(f'Initial csr.indptr dtype: {graph.indptr.dtype}')
            graph = directed2undirected(graph)
            # graph.data = graph.data.astype(np.float32)
            # graph.indices = graph.indices.astype(np.int32)
            # graph.indptr = graph.indptr.astype(np.int32)
            logger.info(f'Post-conversion csr.data dtype: {graph.data.dtype}')
            logger.info(f'Post-conversion csr.indices dtype: {graph.indices.dtype}')
            logger.info(f'Post-conversion csr.indptr dtype: {graph.indptr.dtype}')
            labels = louvain.fit_predict(graph)
            from sknetwork.ranking import PageRank
            rank = PageRank().fit_predict(graph)

File "/root/biograph/lightning-storm/lightning_storm/modules/dbscan.py", line 163, in fit labels = louvain.fit_predict(graph) File "/opt/conda/lib/python3.10/site-packages/sknetwork/clustering/base.py", line 64, in fit_predict self.fit(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/sknetwork/clustering/louvain.py", line 276, in fit labels, increase = self._optimize(labels, adjacency, out_weights, in_weights) File "/opt/conda/lib/python3.10/site-packages/sknetwork/clustering/louvain.py", line 144, in _optimize return optimize_core(labels, indices, indptr, data, out_weights, in_weights, out_cluster_weights, File "sknetwork/clustering/louvain_core.pyx", line 12, in sknetwork.clustering.louvain_core.optimize_core ValueError: Buffer dtype mismatch, expected 'int' but got 'long'

The indicies and indptr are of np.int32 type, the source, destination and weight are numpy arrays. I cant share the entire code because of organization policy, but you get the point.

I will try to use 0.31.0 and see, because we didnt face this problem before. Another interesting point, when my dataset was under 2 million, then this was not an issue, but as soon as the data grows over 2million, I get this.

salman-moh commented 3 months ago

"Solved" by downgrading to 0.31.0, but I will keep this issue open per Contributors wish.

tbonald commented 3 months ago

Thanks for the feedback! We couldn't reproduce the error. Could you please test on this graph (6M edges)?

from sknetwork.data import load_netset
from sknetwork.clustering import Louvain
from sknetwork.utils import directed2undirected

dataset = load_netset('wikivitals+')
adjacency = directed2undirected(dataset.adjacency)
louvain = Louvain()
labels = louvain.fit_predict(adjacency)
salman-moh commented 3 months ago

works, for me.. Downloading wikivitals+ from NetSet... Unpacking archive... Parsing files... Done. due to the switch from old to new version that pointed to the error, I suspect something there is causing this along with my data...Due to nature of data, I dont know how much this issue can progress

tbonald commented 3 months ago

I've just pushed a version on the develop branch where I force the types of indices / indptr. Could you please check with your data? Thanks.

salman-moh commented 3 months ago

I get this, and the program stops. Segmentation fault (core dumped)

mclegrand commented 3 months ago

Mh, could you try again now ? (Just tried always using int64)

Also, are you by any chance on a 32-bit system ?

tbonald commented 1 month ago

This issue is supposed to be fixed by the new version of scikit-network. Please re-open the issue if this is not the case. Thanks.