Why dim != dim_head * heads?

lucidrains / linear-attention-transformer

Transformer based on a variant of attention that is linear complexity in respect to sequence length

MIT License

695 stars 66 forks source link

Why dim != dim_head * heads? #18

Open zzczzc20 opened 8 months ago

zzczzc20 commented 8 months ago

Dear developer, In your use case of LinearAttentionTransformerLM, dim != dim_head * heads. I am a little bit confused about that. Is that an algorithm feature?