Open eyuansu62 opened 10 months ago
why the Rotary position encodings (RoPE) was applied to 64 dimensions of each head rather full dimensions.
why the Rotary position encodings (RoPE) was applied to 64 dimensions of each head rather full dimensions.