xuyige / BERT4doc-Classification

Code and source for paper ``How to Fine-Tune BERT for Text Classification?``
Apache License 2.0
607 stars 99 forks source link

Questions about discriminative_fine_tuning #5

Open wlhgtc opened 4 years ago

wlhgtc commented 4 years ago

In Section 5.4.3 " We find that assign a lower learn- ing rate to the lower layer is effective to fine-tuning BERT, and an appropriate setting is ξ=0.95 and lr=2.0e-5." Compared to the code in https://github.com/xuyige/BERT4doc-Classification/blob/master/codes/fine-tuning/run_classifier.py#L812 Seem that you divide the bert layer into 3 part (4 layers for one part) and set different learning rate for each part. Some questions about it:

  1. How could the decay factor 0.95 match the number 2.6 in code ?
  2. And the last classify layer seem not be contained , no need to set lr for it ?
xuyige commented 4 years ago

Thank you for your issue!

  1. The number 2.6 was set for the beginning experiments, after that, we use run_classifier_discriminative.py for discriminative fine-tuning.
  2. The link to run_classifier_discriminative.py is https://github.com/xuyige/BERT4doc-Classification/blob/master/codes/fine-tuning/run_classifier_discriminative.py
  3. The classifier layer is contained in run_classifier_discriminative.py.
wlhgtc commented 4 years ago

Thanks for your reply, I will try it!