kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku
Apache License 2.0
6.26k stars 890 forks source link

About rope embedding #260

Open eyuansu62 opened 10 months ago

eyuansu62 commented 10 months ago

why the Rotary position encodings (RoPE) was applied to 64 dimensions of each head rather full dimensions.