Tooooo many functions added, but no annotations

Dear author @lucidrains ,

This is really an impressive work. Scaling functions from many papers are added into a single project. However, they're no clear annotations, which makes it difficult to understand which is which, and why adding each of these functions. Is it possible that you add the annotations for these functions? And how's the improvement from each function?

Below are the functions included based on my own observation:

Support multiple linear transformers: Linformer, Reformer, Efficient Attention, Longformer
Support encoder, decoder, transformer
Support reversible translation
Support positional embeddings: rotary embedding, axial pos embedding, normal absolute positional embedding
Support causal, non-causal
Support global, local attn heads, in ETC

Best regards!

lucidrains / linear-attention-transformer

Tooooo many functions added, but no annotations #15