shchur / overlapping-community-detection

Implementation of "Overlapping Community Detection with Graph Neural Networks"
MIT License
157 stars 45 forks source link

Using NetworkX Graph as Input #7

Closed sclipman closed 3 years ago

sclipman commented 3 years ago

Hi Oleksandr,

Thanks for creating an excellent Jupyter notebook to accompany this interesting work! I was hoping to test this out, but am unfamiliar with the .npz input format used. I was wondering if you could provide a code snippet to utilize your method when starting with a NetworkX graph constructed from a simple edge list.

For example starting with graph from:

import networkx as nx
import pandas as pd

edges = pd.read_csv("Edge_List.csv")
graph = nx.from_pandas_edgelist(edges, "Source", "Target")

Thank you! Steve

shchur commented 3 years ago

You should be able to convert the adjacency matrix to scipy.sparse format using this method Then you will need to call A.tocsr() to turn into a CSR matrix.

sclipman commented 3 years ago

Great, thanks for your help! So, the converted scipy.sparse adjacency matrix would become A in your code

loader ='data/mag_cs.npz')
A, X, Z_gt = loader['A'], loader['X'], loader['Z']
N, K = Z_gt.shape

X would not apply in the absence of node attributes, but where does Z_gt come from in this case?

shchur commented 3 years ago

These are the ground truth community labels. If they are not available, you will need to specify K (the # of communities to detect) yourself.

sclipman commented 3 years ago

Perfect, that works but then it hits a snag in the training loop. Perhaps due to the adjacency matrix being a csr_matrix?

I appreciate your help troubleshooting this! We'll be sure to cite your paper and acknowledge you. Detecting overlapping communities is something we want to add on to an upcoming paper.

TypeError                                 Traceback (most recent call last)
<ipython-input-8-3315ab7333dc> in <module>
     11             gnn.eval()
     12             # Compute validation loss
---> 13             Z = F.relu(gnn(x_norm, adj_norm))
     14             val_loss = decoder.loss_full(Z, A)
     15             print(f'Epoch {epoch:4d}, loss.full = {val_loss:.4f}, nmi = {get_nmi():.2f}')

~/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/ in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~/github/Spatial/Spectral Clustering/overlapping-community-detection-master/nocd/nn/ in forward(self, x, adj)
     85         for idx, gcn in enumerate(self.layers):
     86             if self.dropout != 0:
---> 87                 x = sparse_or_dense_dropout(x, p=self.dropout,
     88             x = gcn(x, adj)
     89             if idx != len(self.layers) - 1:

~/github/Spatial/Spectral Clustering/overlapping-community-detection-master/nocd/nn/ in sparse_or_dense_dropout(x, p, training)
     18         return torch.cuda.sparse.FloatTensor(x.indices(), new_values, x.size())
     19     else:
---> 20         return F.dropout(x, p=p, training=training)

~/opt/anaconda3/lib/python3.8/site-packages/torch/nn/ in dropout(input, p, training, inplace)
    981     return (_VF.dropout_(input, p, training)
    982             if inplace
--> 983             else _VF.dropout(input, p, training))

TypeError: dropout(): argument 'input' (position 1) must be Tensor, not csr_matrix
shchur commented 3 years ago

Can you try restarting the notebook and executing all the cells sequentially? The matrix should be converted into a sparse tensor in cell 5 (adj_norm = gnn.normalize_adj(A)), but it seems that this hasn't happened for some reason.

sclipman commented 3 years ago

Thanks! That instance was fine but uncommenting x_norm = nocd.utils.to_sparse_tensor(x_norm).cuda() solved this. It progresses further into the training loop now but looks like it still wants Z_gt to compare against. I actually have the modularity cluster (previously calculated) for each node saved as a 'modularity' node attribute in the NetworkX graph object. I'm not sure how yet, but I wonder if this can be easily converted to a np.ndarray binary community affiliation matrix and used as Z_gt for the ground truth here, since our ultimate goal is to see how these communities overlap.

Edit: This indeed solved it and works great – thanks again for your help!