Initializing of models from checkpoints/pretrained models has gotten a bit crazy - Githubissues

stanford-crfm / levanter

Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax

https://levanter.readthedocs.io/en/latest/

Apache License 2.0

519 stars 82 forks source link

Initializing of models from checkpoints/pretrained models has gotten a bit crazy #780

Open dlwh opened 3 weeks ago

dlwh commented 3 weeks ago

We have a lot of ways of initializing models right now:

from scratch
load the whole state from a levanter checkpoint of the current run (default if present)
loading the whole state from a levanter checkpoint of a different run (trainer.load_checkpoint_path)
loading just the model weights from a levanter checkpoint (trainer.initialize_from)
loading the weights and optimizer state, with an eye towards changing the data (.initialize_from)
ladoing from hf --initialize_from_hf

This is a bit insane and convoluted. We should come up with some rational strategy here.