Problems of fine-tuning LayerNormalization layer and exploring all layers of the model

BaoshengHeTR commented 4 years ago

I have a question. When I call: l_bert.apply_adapter_freeze() will that freeze the original LN layer as well? From your example, the function:

def freeze_bert_layers(l_bert):
    """
    Freezes all but LayerNorm and adapter layers - see arXiv:1902.00751.
    """

is used to freeze layers but LN and adaptive layers. However, the function flatten_layers called in freeze_bert_layers doesn't work properly and it could not locate the layers supposed to be trainable. It would be great if you give the way that we can check the status of each layer.

Finally, I want to confirm that the pretrained bert model used in bert-for-tf2 does not have the pooling layer with tanh activation layer, right?

Thanks.

BaoshengHeTR commented 4 years ago

Results from running flattern_layers, we have:

for l in flatten_layers(l_bert): print(l) <bert.model.BertModelLayer object at 0x7f1e68e73940> <bert.embeddings.BertEmbeddingsLayer object at 0x7f1e6f8517f0> <bert.transformer.TransformerEncoderLayer object at 0x7f1e68db6400>

l_bert._layers[0]._layers []

kpe commented 4 years ago

@BaoshengHeTR - yes, currently the final pooling layer used in pre-training is not included in BertModelLayer and loading of its weights from a pre-trained model is not supported.

kpe / bert-for-tf2

Problems of fine-tuning LayerNormalization layer and exploring all layers of the model #70