microsoft / torchscale

Foundation Architecture for (M)LLMs
https://aka.ms/GeneralAI
MIT License
2.98k stars 201 forks source link

Update new RetNet settings #69

Closed sunyt32 closed 10 months ago

sunyt32 commented 10 months ago

Make the following modifications:

  1. Replace LayerNorm with RMSNorm in both Sub-GroupNorm and Pre-LayerNorm;
  2. Remove the bias in the Linear layers;
  3. Replace FFN with SwiGLU, and re-allocate the parameters where the value_dim and ffn_dim are 5/3 * dim;