Closed hm-ca closed 3 months ago
@hm-ca
So with the example you provided to support attention dropout, you can adjust the attention_dropout
parameter in the model configuration.
Currently, it’s set to 0.0, which means no dropout is applied.
If you want to introduce dropout to help regularize the model and potentially improve its generalization, you can set this parameter to a value between 0.0 and 1.0. For example, setting it to 0.1 would apply a 10% dropout rate to the attention mechanism.
For the Phi-3-vision model, setting the attention_dropout
parameter in the config.json
file is a proper way to introduce dropout in the attention layers.
Here's a step-by-step approach:
Adjust attention_dropout
in config.json
:
attention_dropout
parameter to a value between 0.0
and 1.0
. A common starting point is 0.1
(10% dropout).{
"attention_dropout": 0.1
}
Verify Other Dropout Parameters:
hidden_dropout_prob
or layer_norm_eps
, and adjust them if necessary.Re-train the Model:
Experiment and Tune:
Closing issue based on the example provided
Thanks @leestott .
I'll take it from here!
Hello,
my model is starting to overfit on the training dataset and was wondering what is the recommended way to implement Dropout regularization in Phi-3-vision.
I already have weight decay configured and doing some data augmentation and now wondering if attention_dropout in config.json is the proper way?
Thanks!