zhuchen03 / FreeLB

Adversarial Training for Natural Language Understanding
250 stars 41 forks source link

A few questions about FreeLB and dropout #14

Closed zjiehang closed 3 years ago

zjiehang commented 3 years ago

Thanks for your valuable work "FreeLB", which indeed improves my models' performance. One question that arises when discussing the relationship between Dropout and Adversarial Training, in Section 3.3 of Paper https://arxiv.org/pdf/1909.11764.pdf, where the same mask is suggested to be used in each forward-backward step. However, I can't find such implementations in code (Or maybe I missed it by mistake? ). Does using the same mask affect the results? In my own experiments, I ignore the same mask suggestion, but I also improve performance. Very Thanks!

zhuchen03 commented 3 years ago

Hey, thanks for trying out!

We did implement the modified dropout in the code for both fairseq and Huggingface Transformers. Most of the modification are on the modeling part. For an easier reference, see the lines involving the variable dp_mask in huggingface-transformers/src/transformers/modeling_albert.py.

For a comparison of the results with and without such a modification, please refer to Table 4 in the paper.

zjiehang commented 3 years ago

Wow. Thanks for your prompt reply! I get it. I use the Vanilla transformers framework and feel confused about the "dp_mask" parameter. It does play a role in your revised ”Albert“ version, which is used to control the dropout mask used in the forwarding step. Thanks. Maybe We need a balance between performance and code modification (especially for Vanilla transformers framework, e.g. for other Bert-variant models, do the same modification as your "Albert") when using the same dropout suggestion.