[Feature request] Self-attention with Persistent Memory

lucidrains / taylor-series-linear-attention

Explorations into the recently proposed Taylor Series Linear Attention

MIT License

89 stars 3 forks source link

Open MarcusLoppe opened 4 months ago

MarcusLoppe commented 4 months ago

I've had great luck using it in the x-transformers decoder layer and I think it would be a great addition to linear attention.

Let me know if I can help!