Mistral: A strong, northwesterly wind: Framework for transparent and accessible large-scale language model training, built with Hugging Face 🤗 Transformers.
It is necessary to set "train_batch_size" to "auto" when running training with deepspeed. This pull request updates a few of the deepspeed config files.
It is necessary to set "train_batch_size" to "auto" when running training with deepspeed. This pull request updates a few of the deepspeed config files.