Llama mixture - Githubissues

stanford-crfm / levanter

Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax

https://levanter.readthedocs.io/en/latest/

Apache License 2.0

495 stars 80 forks source link

Closed abhinavg4 closed 1 month ago

abhinavg4 commented 1 month ago

Llama-style experiment mixture support.

Apart from this, you need to provide initialize_from_checkpoint_path in the config and also change the data config to 0.7 and 0.3 weights.

dlwh commented 1 month ago

This looks good. do you have a wandb run I can see?

dlwh commented 1 month ago

Run looks fine. going to merge.