stanford-crfm / haliax

Named Tensors for Legible Deep Learning in JAX
Apache License 2.0
140 stars 9 forks source link

Use sqrt(fan_in) as default in and truncated_normal #93

Closed dlwh closed 1 month ago

dlwh commented 1 month ago

better matches modern practice in the LLM space (c.f. https://www.microsoft.com/en-us/research/uploads/prod/2021/11/TP5.pdf and OLMO)