microsoft / Phi-3CookBook

This is a Phi-3 book for getting started with Phi-3. Phi-3, a family of open sourced AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks.
MIT License
2.46k stars 245 forks source link

Recommended way to implement Dropout? #100

Closed hm-ca closed 3 months ago

hm-ca commented 3 months ago

Hello,

my model is starting to overfit on the training dataset and was wondering what is the recommended way to implement Dropout regularization in Phi-3-vision.

I already have weight decay configured and doing some data augmentation and now wondering if attention_dropout in config.json is the proper way?

Thanks!

leestott commented 3 months ago

@hm-ca

So with the example you provided to support attention dropout, you can adjust the attention_dropout parameter in the model configuration.

Currently, it’s set to 0.0, which means no dropout is applied.

If you want to introduce dropout to help regularize the model and potentially improve its generalization, you can set this parameter to a value between 0.0 and 1.0. For example, setting it to 0.1 would apply a 10% dropout rate to the attention mechanism.

For the Phi-3-vision model, setting the attention_dropout parameter in the config.json file is a proper way to introduce dropout in the attention layers.

Here's a step-by-step approach:

  1. Adjust attention_dropout in config.json:

    • Set the attention_dropout parameter to a value between 0.0 and 1.0. A common starting point is 0.1 (10% dropout).
    {
     "attention_dropout": 0.1
    }
  2. Verify Other Dropout Parameters:

    • Check if there are other dropout-related parameters in the configuration, such as hidden_dropout_prob or layer_norm_eps, and adjust them if necessary.
  3. Re-train the Model:

    • After making these changes, re-train your model and monitor its performance on the validation set to ensure that the dropout is helping to reduce overfitting.
  4. Experiment and Tune:

    • Experiment with different dropout rates and combinations of regularization techniques to find the optimal settings for your specific dataset and model.
leestott commented 3 months ago

Closing issue based on the example provided

hm-ca commented 3 months ago

Thanks @leestott .

I'll take it from here!