stanford-crfm / mistral

Mistral: A strong, northwesterly wind: Framework for transparent and accessible large-scale language model training, built with Hugging Face 🤗 Transformers.
Apache License 2.0
562 stars 49 forks source link

Were the mistral models trained with dropout? #188

Closed ArthurConmy closed 2 years ago

ArthurConmy commented 2 years ago

Were the models trained with dropout?

Searching the repo, there's a config file where there's a dropout parameter 0.0 and a file with dropout parameter non-zero. I am confused and would love an answer. Thanks!

J38 commented 2 years ago

I believe yes they were trained with dropout.

You can see in the configs for the checkpoints: https://huggingface.co/stanford-crfm

J38 commented 2 years ago

If there are config files with dropout 0.0 I think that is for testing ... so we want to test that we get the same model so we set dropout to 0.0 for the test but the production models definitely have standard dropout.