pytorch-labs / attention-gym

Helpful tools and examples for working with flex-attention
BSD 3-Clause "New" or "Revised" License
490 stars 24 forks source link

Add Graphormer mod #8

Open stsouko opened 3 months ago

stsouko commented 3 months ago

In graphormer bias to attention scores are learnable.

https://github.com/microsoft/Graphormer

drisspg commented 3 months ago

Yup, we want to support trainable biases. There is some subtlety with how to implement this -> Broadcasts in the forward become reductions in the backwards. We are investigating though the best way to enable this

jozhang97 commented 2 days ago

Following, also interested in learnable pair bias