mfinzi / equivariant-MLP

A library for programmatically generating equivariant layers through constraint solving
MIT License
251 stars 21 forks source link

Conceptual questions regarding the library #5

Closed StellaAthena closed 3 years ago

StellaAthena commented 3 years ago

Hello! This is a wonderful library and I am very excited to start using it. I had a conceptual question though, and I want to make sure I’m thinking about the framework correctly before I start using it for experiments.

In the paper you talk about how this generalizes various equi models used in previous work, but you don’t go into detail about the relationship between the models (or if you did, I didn’t understand it). Are there conditions under which you can guarantee that the network is identical to the architecture proposed in, e.g., Cohen and Welling 2016? It seems intuitive to hope that for regular representations that would be the case, but I don’t know if it is.

The main reason I am asking about this is that I am interested in the training dynamics of equivariant models. Can results obtained on your architecture be assumed to hold for GCNNs? What about other types of equivariant models?

mfinzi commented 3 years ago

Hi @StellaAthena,

The answer is yes for the original GCNN and Invariant and Equivariant GNs, but with some caveats. It's easiest to talk about the linear layers, so let's just consider those for a moment.

The standard G-Convolution implementation from Cohen and Welling 2016 uses zero padding and 3x3 filters for the regular representation of ℤ², p4, and p4m (ℤₙ², ℤ₄⋉ℤₙ² and ⅅ₄⋉ℤₙ² in our notation). If circular padding and nxn filters were used instead, G-convolution coincides exactly with our equivariant linear layer solutions for these two groups (and the regular representation). The locality bias of using just 3x3 filters is not derivable from equivariance alone, although it may be possible to add such locality constraints to our solver.

You can identify the relationship between G-convolution and our basis for the equivariant linear layers perhaps more easily with the diagram fig 3 from https://ieeexplore.ieee.org/abstract/document/9153847 and comparing with the dense matrix figure 2d in our paper. Each n^2 x n^2 block corresponds to n x n convolution with circular padding. The block diagonals (of similar looking matrices) are the bicirculant matrices corresponding to rotated versions versions of the nxn conv filter (rotated and/or reflected for ⅅ₄⋉ℤₙ²). It takes a bit of staring to see that these are in fact the same as G-convolutions (with circular padding and nxn filters). While we didn't prove the correspondence in the paper, it should be doable.

For the linear layers of Invariant and Equivariant Graph Networks (Maron et al 2018) the solutions are identical (without any subtleties regarding locality or padding).

However, our equivariant linear layers don't encapsulate all of the popular group equivariant networks in the literature such as General E(2) - Equivariant Steerable CNNs (Weiler and Cesa 2019). In this paper they make GCNNs such as with the stabilizer H=D₆. However D₆ is not a symmetry of the lattice ℤₙ² and so one cannot construct D₆⋉ℤₙ² in the way that is meant. Instead, E(2) CNNs are an implementation of a discretization of a GCNN on the group D₆⋉ℝ² based off of continuous translations. On this group E(2) CNN uses the infinite dimensional regular representation and the corresponding solution basis of filters is discretized to a nxn sampling. Since currently EMLP can only be applied to finite dimensional representations, this representation-group combination is not possible in EMLP.

Hopefully this should give you the right idea. In terms of the other layers, EMLP by default adds a bilinear layer not present in other works but this can be removed for some groups without harm. For regular representations the GatedNonlinearity we have implemented simplifies to the swish activation function which is very similar to the ReLUs used elsewhere.

Back to your original question about training dynamics, our library is primarily aimed at researchers building new equivariant networks and where EMLP generalizes GCNN or GNNs it will be less efficient than specialized implementations of those methods. Particularly for GCNNs where if you only want a rotation/reflection equivariant CNN it would be better to use a library like https://github.com/QUVA-Lab/e2cnn.

Hope this answers your question, Marc