pyt-team / challenge-icml-2024

Official repository for the Topological Deep Learning Challenge 2024, organized by TAG-DS & PyT-Team and hosted by GRaM Workshop @ ICML 2024.
https://pyt-team.github.io/packs/challenge.html
MIT License
37 stars 48 forks source link

Hypergraph Heat Kernel Lift (Hypergraph to Simplicial) #58

Open peekxc opened 2 months ago

peekxc commented 2 months ago

This PR implements the hypergraph-to-simplicial topological conversion described in the paper below (1), which is aimed at encoding the higher-order information of a hypergraph into a weighted simplicial complex without loss of information. One notable application of this approach is to bring the power of the heat kernel to the realm of hypergraphs.

Note this is not a 'lift' in the sense that we are going from a higher order structure to a (theoretically) lower order one, though the submission requirements mention this is allowed.

Application example

To illustrate this lift, consider the set of all hypergraphs whose simplicial closures are given by the maximal simplices $S = \{(0,1,2), (1,2,3) \}$. As there are 9 non-maximal faces in the closure, there are $2^9$ distinct hypergraphs with identical closures; for any conversion to be useful, we would like a lift capable of distinguishing between hypergraphs in this set.

Of course, a natural weight map is the identity map $f : S_H \to \{0,1\}$ given by $S(\sigma) = 1$ if $\sigma \in H$ and $0$ otherwise; however, for various reasons, this assignment is not very useful.

Instead, below are 5 hypergraphs and their simplicial lifts weighted using the lift implemented here, along with two featurizations on the right; the first of these shows the amount of heat diffused at each vertex across time starting with a unit amount of heat at the red vertex, the second shows the heat kernel signature, a natural featurization one could associate with this lift. Keeping in line with simplicial weights representing thermal conductivity in the Laplacian sense, the 'width' of each edge is inversely proportional to its weight deduced by the input hypergraph.

hks_diffusion

To illustrate the diffusion curves, consider the 3rd row: the red vertex (0) starts with a unit amount of heat, which it diffuses through the weighted edges (0,1) and (0,2) via the heat kernel. The blue vertex heats up faster than the green due to having higher conductivity, and the orange vertex (3) heats up the slowest due to being not adjacent to the heat source (0).

Observe from above that not only is the HKS able to distinguish between different hypergraphs, but also symmetrically related graphs are handled naturally. To demonstrate this, below is a plot of the MDS embedding computed from the Euclidean distance matrix over the HKS-features, for a heuristic choice of time points $t_1, t_2, \dots, t_k$. image Though not exhibiting perfect symmetry, many of the distances are quite intuitive, and each of the $2^9$ distinct hypergraphs are indeed distinguished by the HKS.

Overview

The high-level algorithmic pipeline of this lift is as follows:

  1. Define an undirected hypergraph $H = (V, \mathcal{E})$ to be converted to simplicial complex
  2. Take the downward / simplicial closure $S$ of $H$
  3. For each simplex $\sigma \in S$, compute a weight map $f: S \to R+$ by combining the simplex's topological weight $w\sigma$ with its associated affinity weight $\omega_\sigma$:

$$w(\sigma) = \omega\sigma + \sum\limits{\sigma' \in \mathrm{cofacet}(\sigma)} w_\sigma'$$

  1. The computed weight function $w(\sigma)$ induces an inner product $\langle \rangle_w$ on the cochain space $C^p(S, \mathbb{R})$ of $S$. To capture higher-order interactions exploiting this inner product, choose a weighted Hodge Laplacian operators $\mathcal{L}_p$:

$$\langle f, g \rangle = \sum\limits{\sigma \in S, \mathrm{dim}(\sigma) = d} w\sigma \cdot f([\sigma]) g([\sigma]), \hspace{1em} \mathcal{L}_p = L_p^{\mathrm{up}} + L_p^{\mathrm{dn}}$$

  1. Choose a weight-dependent featurization of $\mathcal{L}_p$ for learning purposes; a classical featurization is the Heat Kernel Signature, which captures multiscale diffusion-based information via the heat kernel.

Properties of the weights

There are many ways to map higher-order interactions to simplicial weights; to define a notion of 'topological weight', (1) define a weighting scheme that satisfies:

$$ \begin{cases} w\sigma > 0 & \text{for every face } \sigma \in S, \ w\sigma \geq \sum\limits{\tau \supset \sigma} w\tau & \text{for every codim. 1 coface } \tau \in S \ wv = \sum\limits{h \in H} 1(v \in h) & \text{ for all } v \in V. \end{cases} $$

This lifting implements the weighting scheme defined by Eq. 3 & Eqs. (63-65) from (1), which not only satisfies all these properties, but can also be computed quickly for the $d$-skeleton $S_d \subseteq S$.

For hypergraphs, the affinity weight is deduced by the number of times $c$ a face $\sigma \in S$ appears in a hyperedge $h \in H$ of order $n$, while the topological weight depends only the dimensions of the faces that contain the given simplex. For an example of the weighting scheme, see the picture below:

Screenshot 2024-07-12 at 10 03 23 PM

The primary use-case for this simplex-weight mapping is to make a valid inner product on the cochain space that captures higher-order interactions using only simplicial structure. In particular, multiscale invariants such as those deriving from the heat kernel were shown in [1] to yield more information gain than in the unweighted settings.

Implementation details

This lift implementation includes:

  1. An implementation of the topological + affinity weighting scheme from (1)
  2. Fast downward closure code for restricting to $d$-skeletons
  3. Three additional hypergraph datasets (1 toy and 2 real)

NOTE: Two of the three supplied datasets come from Google drive links, which require the gdown package to download (we provide these scripts as valid loaders in the pipeline).

  1. Proof of concept modeling code taken from the topomodelx tutorial page

The lifting code itself requires the hirola package to efficiently compute the $d$-skeleton. This was added as a dependency to the pyproject.toml.

References

  1. Weighted simplicial complexes and their representation power of higher-order network data and topology." Physical Review E 106.3 (2022): 034319, by Baccini, Federica, Filippo Geraci, and Ginestra Bianconi.

  2. Sun, Jian, Maks Ovsjanikov, and Leonidas Guibas. "A concise and provably informative multi‐scale signature based on heat diffusion." Computer graphics forum. Vol. 28. No. 5. Oxford, UK: Blackwell Publishing Ltd, 2009.

gbg141 commented 2 months ago

Hello @peekxc! Thank you for your submission. As we near the end of the challenge, I am collecting participant info for the purpose of selecting and announcing winners. Please email me (or have one member of your team email me) at guillermo_bernardez@ucsb.edu so I can share access to the voting form. In your email, please include:

Before July 12, make sure that your submission respects all Submission Requirements laid out on the challenge page. Any submission that fails to meet this criteria will be automatically disqualified.

gbg141 commented 2 months ago

Hi @peekxc! While checking your submission I found out that you did not provide a tutorial notebook (in the corresponding folder there is only a .py file).

Did you forget to generate it?

peekxc commented 2 months ago

@gbg141 Ahh yes, sorry, I tend to prefer developing using the percent format (or Jupyter cells in VSCode); I can generate a .ipynb file with one click.

Would you like me to convert it?