Closed jonathanbratt closed 5 years ago
If I understand this post correctly, in most BERT-related articles, when 12 layers are mentioned for the uncased model, this corresponds to layer_output_1
to layer_output_12
in output from RBERT::extract_features()
, ya?
That is correct. When you see layer_output_0
in RBERT output, that corresponds to the vectors input to the first transformer layer.
For completeness, would be good to return the bare token embeddings before any transformer layers along with the layer outputs.