How to reconstruct the full attention matrix?

FarzanT commented 1 year ago

Hello,

The implementation for the Reformer model allows for the reconstruction of the full attention matrix (https://github.com/lucidrains/reformer-pytorch#research). There, the Recorder class can expand the attention matrix to it's original form. How can one get this full attention matrix for the Routing transformer? The Recorder class is only compatible with the Reformer transformer. The full attention matrix is needed for Transformer Interpretability/Explanation, such as the one described here: https://github.com/hila-chefer/Transformer-Explainability

I believe it would involve the lines here: https://github.com/lucidrains/routing-transformer/blob/3f6c461a036e98dbae7e70c623d1c0e0616ef82a/routing_transformer/routing_transformer.py#L407-L417

KatarinaYuan commented 1 year ago

Hi, have you solved this problem?

FarzanT commented 1 year ago

@KatarinaYuan Hi, unfortunately not, I don't think it's trivial. I decided to use the full attention matrix but with more efficient implementations such as in PyTorch 2.0 and DeepSpeed. Hope it helps!

lucidrains / routing-transformer

How to reconstruct the full attention matrix? #33