Open fabrahman opened 4 years ago
I noticed the freeze/requires_grad logic is commented in learner_cls. @kaushaltrivedi how does one freeze the bert layers and only train the added custom layer?
I tried adding a freeze_transformers_layer conditional in BertLearner's init function that set requires_grad to false for any param with the model_type in named_parameters but it didn't seem to have any effect. This approach worked for me in another implementation, resulting in exponentially reduced training times.
just for clarification, freezing the layers means that there is a single linear layer (i.e. classifier head?) being trained for classification rather than all the layers being used to update the weights (of the input text) during training?
That is correct. Also note that I did indeed confirm this by looping through and printing all layer names along with their requires_grad setting (summarized for brevity):
bert.embeddings.word_embeddings.weight, requires_grad:False bert.embeddings.position_embeddings.weight, requires_grad:False bert.embeddings.token_type_embeddings.weight, requires_grad:False ... bert.encoder.layer.11.output.LayerNorm.bias, requires_grad:False bert.pooler.dense.weight, requires_grad:False bert.pooler.dense.bias, requires_grad:False classifier.weight, requires_grad:True classifier.bias, requires_grad:True
I wonder if the default code optimizes the entire model end-to-end or just the additional classifier layer parameters get updated?
Thanks