lucidrains / linear-attention-transformer

Transformer based on a variant of attention that is linear complexity in respect to sequence length
MIT License
671 stars 65 forks source link

Tooooo many functions added, but no annotations #15

Open charlesxu90 opened 2 years ago

charlesxu90 commented 2 years ago

Dear author @lucidrains ,

This is really an impressive work. Scaling functions from many papers are added into a single project. However, they're no clear annotations, which makes it difficult to understand which is which, and why adding each of these functions. Is it possible that you add the annotations for these functions? And how's the improvement from each function?

Below are the functions included based on my own observation:

  1. Support multiple linear transformers: Linformer, Reformer, Efficient Attention, Longformer
  2. Support encoder, decoder, transformer
  3. Support reversible translation
  4. Support positional embeddings: rotary embedding, axial pos embedding, normal absolute positional embedding
  5. Support causal, non-causal
  6. Support global, local attn heads, in ETC

Best regards!