Open stsouko opened 3 months ago
Yup, we want to support trainable biases. There is some subtlety with how to implement this -> Broadcasts in the forward become reductions in the backwards. We are investigating though the best way to enable this
Following, also interested in learnable pair bias
In graphormer bias to attention scores are learnable.
https://github.com/microsoft/Graphormer