lucidrains / routing-transformer

Fully featured implementation of Routing Transformer
MIT License
282 stars 29 forks source link

How to reconstruct the full attention matrix? #33

Open FarzanT opened 1 year ago

FarzanT commented 1 year ago

Hello,

The implementation for the Reformer model allows for the reconstruction of the full attention matrix (https://github.com/lucidrains/reformer-pytorch#research). There, the Recorder class can expand the attention matrix to it's original form. How can one get this full attention matrix for the Routing transformer? The Recorder class is only compatible with the Reformer transformer. The full attention matrix is needed for Transformer Interpretability/Explanation, such as the one described here: https://github.com/hila-chefer/Transformer-Explainability

I believe it would involve the lines here: https://github.com/lucidrains/routing-transformer/blob/3f6c461a036e98dbae7e70c623d1c0e0616ef82a/routing_transformer/routing_transformer.py#L407-L417

KatarinaYuan commented 1 year ago

Hi, have you solved this problem?

FarzanT commented 1 year ago

@KatarinaYuan Hi, unfortunately not, I don't think it's trivial. I decided to use the full attention matrix but with more efficient implementations such as in PyTorch 2.0 and DeepSpeed. Hope it helps!