stanford-crfm / levanter

Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
https://levanter.readthedocs.io/en/latest/
Apache License 2.0
519 stars 82 forks source link

Initializing of models from checkpoints/pretrained models has gotten a bit crazy #780

Open dlwh opened 3 weeks ago

dlwh commented 3 weeks ago

We have a lot of ways of initializing models right now:

  1. from scratch
  2. load the whole state from a levanter checkpoint of the current run (default if present)
  3. loading the whole state from a levanter checkpoint of a different run (trainer.load_checkpoint_path)
  4. loading just the model weights from a levanter checkpoint (trainer.initialize_from)
  5. loading the weights and optimizer state, with an eye towards changing the data (.initialize_from)
  6. ladoing from hf --initialize_from_hf

This is a bit insane and convoluted. We should come up with some rational strategy here.