microsoft / topologic

A python library for intelligently building networks and network embeddings, and for analyzing connected data.
https://topologic.readthedocs.io
MIT License
27 stars 8 forks source link

Union largest connected component omni strategy #58

Closed nicaurvi closed 3 years ago

nicaurvi commented 3 years ago

Previously we were limiting the nodes included during the omnibus embedding to be the set of nodes in the intersection of the largest connected components between each pair of graphs passed as an argument.

With some joint research with JHU we have developed a new strategy that allows us to embed more nodes. To accomplish this, we:

  1. Create a graph that contains the union of all edges across all time series graphs
  2. Calculate the LCC of that union graph
  3. For each time series graph, assert that: a. Remove all nodes not contained in the vertex set of the LCC union graph b. Add all nodes in the LCC union graph to the graph as an isolate (no edges)
    • This is important as it allows us to maintain the correct shape of each time series graph's adjacency matrix

I added a test called test_union_lcc_returns_expected_shape that hopefully illuminates the exact scenario.