jbrry / Irish-BERT

Repository to store helper scripts for creating an Irish BERT model.
Other
9 stars 0 forks source link

Role of top layers in fine-tuning BERT #94

Open jowagner opened 2 years ago

jowagner commented 2 years ago

Assumption: Fine-tuning BERT for dependency parsing changes the top layers in such a way as to make the information from the lower layers available to the parser and also to outsource some of the processing that normally happens in the Bi-LSTM layers of the parser to the top layers of BERT.

Idea for testing this:

  1. Re-initialise the parameters of the top 3 or so layers randomly,
  2. freeze the BERT layers that have not been re-initialised,
  3. train the top layers on the fine-tuning task,
  4. unfreeze BERT and
  5. fine-tune all layers as usual.

If this performs just as well as the standard procedure this would mean that the information in the top layers from pre-training does not contribute anything to the final task.

See also issue #93.