Open zzczzc20 opened 8 months ago
Dear developer, In your use case of LinearAttentionTransformerLM, dim != dim_head * heads. I am a little bit confused about that. Is that an algorithm feature?
Dear developer, In your use case of LinearAttentionTransformerLM, dim != dim_head * heads. I am a little bit confused about that. Is that an algorithm feature?