Using NetworkX Graph as Input

sclipman commented 3 years ago

Hi Oleksandr,

Thanks for creating an excellent Jupyter notebook to accompany this interesting work! I was hoping to test this out, but am unfamiliar with the .npz input format used. I was wondering if you could provide a code snippet to utilize your method when starting with a NetworkX graph constructed from a simple edge list.

For example starting with graph from:

import networkx as nx
import pandas as pd

edges = pd.read_csv("Edge_List.csv")
graph = nx.from_pandas_edgelist(edges, "Source", "Target")

Thank you! Steve

shchur commented 3 years ago

You should be able to convert the adjacency matrix to scipy.sparse format using this method https://networkx.org/documentation/stable/reference/generated/networkx.convert_matrix.to_scipy_sparse_matrix.html. Then you will need to call A.tocsr() to turn into a CSR matrix.

sclipman commented 3 years ago

Great, thanks for your help! So, the converted scipy.sparse adjacency matrix would become A in your code

loader = nocd.data.load_dataset('data/mag_cs.npz')
A, X, Z_gt = loader['A'], loader['X'], loader['Z']
N, K = Z_gt.shape

X would not apply in the absence of node attributes, but where does Z_gt come from in this case?

shchur commented 3 years ago

These are the ground truth community labels. If they are not available, you will need to specify K (the # of communities to detect) yourself.

sclipman commented 3 years ago

Perfect, that works but then it hits a snag in the training loop. Perhaps due to the adjacency matrix being a csr_matrix?

I appreciate your help troubleshooting this! We'll be sure to cite your paper and acknowledge you. Detecting overlapping communities is something we want to add on to an upcoming paper.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-3315ab7333dc> in <module>
     11             gnn.eval()
     12             # Compute validation loss
---> 13             Z = F.relu(gnn(x_norm, adj_norm))
     14             val_loss = decoder.loss_full(Z, A)
     15             print(f'Epoch {epoch:4d}, loss.full = {val_loss:.4f}, nmi = {get_nmi():.2f}')

~/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~/github/Spatial/Spectral Clustering/overlapping-community-detection-master/nocd/nn/gcn.py in forward(self, x, adj)
     85         for idx, gcn in enumerate(self.layers):
     86             if self.dropout != 0:
---> 87                 x = sparse_or_dense_dropout(x, p=self.dropout, training=self.training)
     88             x = gcn(x, adj)
     89             if idx != len(self.layers) - 1:

~/github/Spatial/Spectral Clustering/overlapping-community-detection-master/nocd/nn/gcn.py in sparse_or_dense_dropout(x, p, training)
     18         return torch.cuda.sparse.FloatTensor(x.indices(), new_values, x.size())
     19     else:
---> 20         return F.dropout(x, p=p, training=training)
     21 
     22 

~/opt/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py in dropout(input, p, training, inplace)
    981     return (_VF.dropout_(input, p, training)
    982             if inplace
--> 983             else _VF.dropout(input, p, training))
    984 
    985 

TypeError: dropout(): argument 'input' (position 1) must be Tensor, not csr_matrix

shchur commented 3 years ago

Can you try restarting the notebook and executing all the cells sequentially? The matrix should be converted into a sparse tensor in cell 5 (adj_norm = gnn.normalize_adj(A)), but it seems that this hasn't happened for some reason.

sclipman commented 3 years ago

Thanks! That instance was fine but uncommenting x_norm = nocd.utils.to_sparse_tensor(x_norm).cuda() solved this. It progresses further into the training loop now but looks like it still wants Z_gt to compare against. I actually have the modularity cluster (previously calculated) for each node saved as a 'modularity' node attribute in the NetworkX graph object. I'm not sure how yet, but I wonder if this can be easily converted to a np.ndarray binary community affiliation matrix and used as Z_gt for the ground truth here, since our ultimate goal is to see how these communities overlap.

Edit: This indeed solved it and works great – thanks again for your help!

shchur / overlapping-community-detection

Using NetworkX Graph as Input #7