Using Performer with GNNs

My understanding of "Rethinking Attention with Performers" is that FAVOR+ is used to approximate the attention matrix and avoids the use of the softmax function. In the README.md file, you note that the Plain Performer can be used if we are using images or other modalities, just as the authors elude to Performer's use in other areas.

I am interested in using Perfomer to approximate attention between nodes in a graph neural network. The graph neural network contains vectors characterizing the node's features and boolean edge indices indicating a connection between two nodes.

Do you have any recommendations how this is feasible with the current Performer model? I see that Attention.forward() contains input for a mask.

lucidrains / performer-pytorch

Using Performer with GNNs #87