sknetwork-team / scikit-network

Graph Algorithms
Other
602 stars 67 forks source link

Error when cutting Paris dendrogram #546

Closed ninasachdev closed 1 year ago

ninasachdev commented 1 year ago

Description

Hello,

Thank you for creating scikit-network, it's been a very useful tool for us! I am trying to implement the Paris dendrogram from an SNN adjacency matrix generated from the R package Seurat. Everything runs smoothly and I'm able to cut the dendrogram with cut_straight using default parameters (n_clusters = None). When I try to specify n_clusters, however, I get the ValueError shown below.

My adjacency matrix's dimensions are approx. 50,000 x 50,000, and the resulting dendrogram has quite a few inf values in the height column. Could this issue have to do with the inf values, or perhaps each height value needs to be unique for the dendrogram to be ordered correctly?

What I Did

adjacency = np.load('path_to_adjacency_matrix.npy')
adjacency = sparse.csr_matrix(adjacency)
paris = Paris()
dendrogram = paris.fit_predict(adjacency)
labels = cut_straight(dendrogram, n_clusters=100, return_dendrogram=True)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-209-af82b686f96b> in <module>
----> 1 labels = cut_straight(dendrogram, n_clusters=100, return_dendrogram=True)

~/miniconda3/lib/python3.9/site-packages/sknetwork/hierarchy/postprocess.py in cut_straight(dendrogram, n_clusters, threshold, sort_clusters, return_dendrogram)
    108 
    109     if return_dendrogram and not np.all(np.diff(dendrogram[:, 2]) >= 0):
--> 110         raise ValueError("The third column of the dendrogram must be non-decreasing.")
    111 
    112     cluster = {i: [i] for i in range(n)}

ValueError: The third column of the dendrogram must be non-decreasing.
tbonald commented 1 year ago

Thank you very much for your feedback. This has been fixed (please check the develop branch; will be available in the next release).

ninasachdev commented 1 year ago

Thank you so much for your prompt response! I updated the functions in the hierarchy module, but am running into another error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-50-33674d51a39e> in <module>
----> 1 labels = cut_straight_dev(dendrogram, n_clusters=100, return_dendrogram=True)

<ipython-input-7-adb9d8c1d297> in cut_straight_dev(dendrogram, n_clusters, threshold, sort_clusters, return_dendrogram)
     92             cluster[n + t] = cluster.pop(i) + cluster.pop(j)
     93 
---> 94     return get_labels_dev(dendrogram, cluster, sort_clusters, return_dendrogram)

<ipython-input-7-adb9d8c1d297> in get_labels_dev(dendrogram, cluster, sort_clusters, return_dendrogram)
     19         current_cluster_new = len(clusters)
     20         for i, j, height, _ in dendrogram:
---> 21             i_new = cluster_index.pop(int(i))
     22             j_new = cluster_index.pop(int(j))
     23             if i_new != j_new:

KeyError: 59725

Since there are several inf values in the height column of my dendrogram, reorder_dendrogram is comparing two rows with the same inf height, and thus the order of the dendrogram doesn't change. The cut variable in cut_straight() is also equal to inf.

This KeyError occurs the first time if i_new != j_new: is True in get_labels(). Could this have to do with how my dendrogram is ordered?

Please let me know if I can clarify anything!

tbonald commented 1 year ago

Hard to debug without the dendrogram variable. Is there a way to share it?

ninasachdev commented 1 year ago

Yes I just emailed the dendrogram to you, let me know if you have issues with opening it!

tbonald commented 1 year ago

Hi @ninasachdev I seems that there is an inversion in your dendrogram (on row 5663, the merge is done at a height lower that those of the clusters), causing the error in the cut. So the problem is probably in the generation of the dendrogram. Could you please send me the original graph? Thanks.

tbonald commented 1 year ago

I'm closing this issue. Feel free to reopen it if needed.