stanford-crfm / levanter

Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
https://levanter.readthedocs.io/en/latest/
Apache License 2.0
495 stars 80 forks source link

Llama mixture #706

Closed abhinavg4 closed 1 month ago

abhinavg4 commented 1 month ago

Llama-style experiment mixture support.

Apart from this, you need to provide initialize_from_checkpoint_path in the config and also change the data config to 0.7 and 0.3 weights.

dlwh commented 1 month ago

This looks good. do you have a wandb run I can see?

dlwh commented 1 month ago

Run looks fine. going to merge.

https://wandb.ai/stanford-mercury/marin/runs/llama1b-fw-txt-dclm-mixture-0825?nw=nwuserabhinavg4