Feature Request for patchwise message passing interface

🚀 Feature

First of all, thanks for this amazing project. I am thinking about implementing pairwise and patchwise self-attention presented in this paper for general graph structures. Now, the pairwise part is easy, as it is covered by your provided message passing interface, however I am not sure, if there is an easy workaround or even a "patchwise" message passing interface in PyG. If not, this might be an interesting addition.

How would a patchwise interface differ from the current one:

Current: $\bold{x}_i^{(k)}=\gamma^{(k)}\left(\bold{x}_i^{(k-1)},\square_{j\in\mathcal{N}(i)} \phi^{(k)}\left(\bold{x}_i^{(k-1)},\bold{x}_j^{(k-1)},\bold{e}_{j,i} \right ) \right )$
Proposal: $\bold{x}_i^{(k)}=\gamma^{(k)}\left(\bold{x}_i^{(k-1)},\phi^{(k)}\left(\left[\bold{x}_j^{(k-1)},\bold{e}_{j,i}\right]_{j\in\mathcal{N}(i)}\right ) \right )$
The brackets [...] denote concatenation (possibly along a new dimension) over the indexes. For this to make sense, one needs to introduce an ordering to the outgoing (or ingoing) edges, so that $j\in\mathcal{N}(i)$ actually represents a sorted and consistent traversal of the neighbors. This proposed patchwise message passing interface is a strict generalization of the pairwise one (just unpack the concatenated feature vectors and define $\phi^{(k)}$ as the inner part of the pairwise interface. This generalization, however, might be the reason why it does not exist in the framework, as it might not allow optimizations to take place like in the pairwise interface?! I am not clear on this point, so I would appreciate any discussion (e.g. not computational feasible, not easily implemented in the current project, workaround using PyG, etc.).

Thanks for your help.

Motivation

Like the mentioned paper explains, a patchwise operator is a strict generalization of the standard convolution operator, and for images it outperforms the pairwise operator (see paper) (I realize that the ordering of neighbors in image pixels is more natural than in other graphs, but I think it's worth trying it out).

Additional context

Link again to the paper.

Hi and thanks for this issue. You are right that this is currently not supported in PyG, and this is mainly due to two reasons:

It‘s hard to parallelize, since for non-regular graphs we need to either pad neighborhoods with fake nodes, or process „each degree“ sequentially (degree bucketing). This is known to be rather slow.
It‘s kind of unnatural to define an ordering of neighbor-processing on graphs or to define non permutation-invariant GNNs.

I agree though that this is a generalization to the above message passing interface and might have potential use-cases. Furthermore, I actually already have plans to implement it (but it‘s not top prio). However, personally, I feel that approaches like our SplineCNN model are a better generalization to CNNs (since it‘s more robust to varying neighbors and does not require a specific ordering of neighbors).

Hi,

thanks for the fast comment.

Okay, I can see how degree bucketing is bad (not sure why the processing of different degrees cannot happen parallel?), but I am coming from a pointcloud point-of-view where one option to create a graph would be kNN. This enforces a constant degree of k at every node and therefore would avoid the problem. So in a way, I was not even generalizing as much as you were now already :) I think with the restriction of constant node degree it could be efficient and does not require padding?
There are quite a few problems that could be translated to graph problems but are actually processed by non-graph-networks and perform similarly well without guaranteeing permutation-invariance. So in general I find arguments regarding "naturalness" quite weak, as the understanding of DL models and training are still very limited and usually it is better to just experimentally evaluate in which cases permutation-invariance is important (probably from dataset to dataset different). Also again in the pointcloud regime a semi-natural order would be by distance of the neighbors, and I can imagine, other tasks might also have access to some more or less arbitrary orderings.

I can totally understand, if it is not your priority, I am actually happy to just get a comment :+1: Maybe then just one follow up question, cause I am a bit overwhelmed by all the implementations and just started looking into GNNs, PyTorch and PyG.

... approaches like our SplineCNN model are a better generalization to CNNs ...

If you have multiple models in mind, could you make a short semi-complete list of those? That would be really appreciated and might help future readers of this issue too.

Thanks again :)

I agree. For kNN graphs, you can already do this by simply reshaping your messages to the [num_nodes, num_neighbors, num_features] format (we already do this for the XConv operator). For other graphs (where you may have multiple node degrees), parallelization across those degrees is kinda impossible in PyTorch without padding (at least I do not know how).
I agree that one could think of semi-natural orderings for specific domains, e.g., like the SpiralNet operator on meshes. If you want to have a closer look into permutation-invariant GNNs for point clouds, you might be interested in the EdgeConv, GMMConv, NNConv, PointConv or SplineConv operators.

pyg-team / pytorch_geometric