stanford-crfm / levanter

Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
https://levanter.readthedocs.io/en/latest/
Apache License 2.0
491 stars 78 forks source link

Change linear init for smaller models #699

Open abhinavg4 opened 3 weeks ago

dlwh commented 2 weeks ago

I think we're probably fine, but might be good to train a 100m to be sure