microsoft / Cream

This is a collection of our NAS and Vision Transformer work.
MIT License
1.62k stars 220 forks source link

Rethinking and Improving Relative Position Encoding for Vision Transformer with memory optimized attentions #142

Open jakubMitura14 opened 1 year ago

jakubMitura14 commented 1 year ago

Hello I was wondering whether your relative positional encoding schemes would work with approximate attention mechanisms for example like presented in flash attention https://arxiv.org/abs/2205.14135

wkcn commented 1 year ago

Thanks for your attention to our work!

Let me read the paper and check whether RPE works with approximate attention mechanisms.