xgi-org / xgi

CompleX Group Interactions (XGI) is a Python package for higher-order networks.
https://xgi.readthedocs.io
Other
187 stars 32 forks source link

nodes named after different numbers in adjacency matrix and .draw #276

Closed cperezln closed 1 year ago

cperezln commented 1 year ago

i have this code h = xgi.Hypergraph([[1, 2, 3, 7], [4], [5, 6, 7]]) xpl.draw(h, node_labels = True) a, r = mx.adjacency_matrix(h, index = True) df = pd.DataFrame(a.toarray()) and the output of it, in images from xpl draw (being xpl from xgi.drawing import xgi_pylab as xpl) and the dataframe are imagen imagen as you can see, the most connected node in the plot is 7, but in the dataframe its not. also, they seem to be unordered. why is this?

maximelucas commented 1 year ago

Hi @cperezln, thanks for you question. This is expected behaviour, two things are happening here.

  1. nodes are added to the hypergraphs in the order in which they are in your input edges. This means that 7 is added before 4 in your case.
  2. the adjacency matrix being a numpy array, it contains no information about which columns/rows correspond to which node labels. So when you convert it to a pandas Dataframe, by default the index will from 0 to N-1, regardless of the node labels. That is why there is an option to return the mapping between matrix index and node label with index=True, which you used. If you can tell pandas to use the node labels with: df = pd.DataFrame(a.toarray(), index=r.values(), columns=r.values()) then you get: Screenshot 2023-02-09 at 22 41 16

    from which you can see that node 7 is indeed the most connected, as expected.

To control the default order from point 1., you can always add nodes before the edges with

H = xgi.Hypergraph()
H.add_nodes_from(range(1,8)) 
H = xgi.add_edges_from([[1, 2, 3, 7], [4], [5, 6, 7]])

This way the nodes will be ordered in ascending order. Converting the adjacency matrix to a Dataframe without specifying the index will still go from 0-6 instead of 1-7 though.

One more thing: all the functions you used can be called directly from xgi.: xgi.draw() and xgi.adjacency_matrix().

I hope this answers your questions.

nwlandry commented 1 year ago

Thanks for this suggestion, @cperezln! Here's my thought: let's not fix this but describe ways to permute rows/columns in numpy/scipy/pandas in the header of matrix.py so that there's good documentation on this?

cperezln commented 1 year ago

Thank you so much @maximelucas for your help - that was what I needed at the moment. And @nwlandry I haven't been working on this since I asked the issue so, now that I am going deeper on this (and I will need to have a ordered structure for the nodes and computation) I will the ways of doing it you've propossed me. I will let you know how it goes

nwlandry commented 1 year ago

I wrote this function to sort by the dicts that all the matrix functions output:

def sort_matrix_by_dicts(M, rowdict, coldict):
    mat = M.copy()
    try:
        mat = mat[sorted(rowdict, key=rowdict.get), :]
    except ValueError:
        warn("Sorting rows unsuccessful")

    try:
        mat = mat[:, sorted(coldict, key=coldict.get)]
    except:
        warn("Sorting columns unsuccessful")

    return mat

Should I add a code snippet to the comments in header of the matrix.py file or add it to the utilities.py file?

nwlandry commented 1 year ago

Perhaps a comment makes the most sense.