Local multi-headed self-attention

ck-amrahd commented 9 months ago

I am unable to find the clean implementation of local multi-headed self-attention in pytorch geometric. I found three types of multi-head attention, one TransformerConv (https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.TransformerConv.html#torch_geometric.nn.conv.TransformerConv). But this one calculates a linear combination of all features with different attention weights as opposed to dividing features into multiple heads and taking their linear combination: another RGATConv in the similar direction (https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.RGATConv.html). And finally GPSConv (https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.GPSConv.html) that does multi-head attention but is global.

I think it is nice to have the implementation of local self-attention with multiple heads where each head looks into a part of the feature dimension.

No response

Dsantra92 commented 8 months ago

Hey @rusty1s Can I work on this issue?

rusty1s commented 8 months ago

Feel free take this if you want :)

ck-amrahd commented 8 months ago

TransformerConv may be a good starting point. I think the main changes that need to be done is in the softmax calculation for each head.

pyg-team / pytorch_geometric