pytorch-labs / attention-gym

Helpful tools and examples for working with flex-attention
BSD 3-Clause "New" or "Revised" License
451 stars 21 forks source link

Add Graphormer mod #8

Open stsouko opened 2 months ago

stsouko commented 2 months ago

In graphormer bias to attention scores are learnable.

https://github.com/microsoft/Graphormer

drisspg commented 2 months ago

Yup, we want to support trainable biases. There is some subtlety with how to implement this -> Broadcasts in the forward become reductions in the backwards. We are investigating though the best way to enable this