neuralmind-ai / portuguese-bert

Portuguese pre-trained BERT models
Other
792 stars 122 forks source link

Can I use this model as a layer of a larger model? #35

Open Benjamim-EP opened 3 years ago

Benjamim-EP commented 3 years ago

I would like to know how I can use this template as in the example below

` class DCNNBERTEmbedding(tf.keras.Model):

def __init__(self,
             nb_filters=50,
             FFN_units=512,
             nb_classes=2,
             dropout_rate=0.1,
             name="dcnn"):
    super(DCNNBERTEmbedding, self).__init__(name=name)

    # Layer embedding  bert
    self.bert_layer = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1",name = "bert",
                                     trainable = False)

    self.bigram = layers.Conv1D(filters=nb_filters,
                                kernel_size=2,
                                padding="valid",
                                activation="relu")
    self.trigram = layers.Conv1D(filters=nb_filters,
                                 kernel_size=3,
                                 padding="valid",
                                 activation="relu")
    self.fourgram = layers.Conv1D(filters=nb_filters,
                                  kernel_size=4,
                                  padding="valid",
                                  activation="relu")
    self.pool = layers.GlobalMaxPool1D()
    self.dense_1 = layers.Dense(units=FFN_units, activation="relu")
    self.dropout = layers.Dropout(rate=dropout_rate)
    if nb_classes == 2:
        self.last_dense = layers.Dense(units=1,
                                       activation="sigmoid")
    else:
        self.last_dense = layers.Dense(units=nb_classes,
                                       activation="softmax")
# Fazer embedding com bert
def embed_with_bert(self, all_tokens):
  # Lembrar dos parametros retornados pelo bert_layers, o primeiro relacionado a sentença inteira
   # O segundo relacionado aos embedding, então queremos só o segundo retorno
  _, embs = self.bert_layer([all_tokens[:, 0, :], # [: (todos os tokens), 0 (os ids), : (tudo que tiver no restante)]
                             all_tokens[:, 1, :], # [:,1 (mascara),:]
                             all_tokens[:, 2, :]])
  return embs

# Função para buscar a camada de embedding
def call(self, inputs, training):
    x = self.embed_with_bert(inputs)

    x_1 = self.bigram(x)
    x_1 = self.pool(x_1)
    x_2 = self.trigram(x)
    x_2 = self.pool(x_2)
    x_3 = self.fourgram(x)
    x_3 = self.pool(x_3)

    merged = tf.concat([x_1, x_2, x_3], axis=-1) # (batch_size, 3 * nb_filters)
    merged = self.dense_1(merged)
    merged = self.dropout(merged, training)
    output = self.last_dense(merged)

    return output

`

fabiocapsouza commented 3 years ago

Hi @Benjamim-EP ,

I am not a TensorFlow user, so unfortunately I can't give you directions. But it should be possible to adapt a working example for English BERT (or other language) using BERTimbau TensorFlow checkpoint (weights) and config file. I'll leave this issue open so others may help you. Please share your experience with us if you find a solution :)

dimitreOliveira commented 2 years ago

Hi @Benjamim-EP and @fabiocapsouza , using this model as part of another model should be straightforward, here is a minimal example, that uses BERT as base and adds a classifier head at the top:

# Load BERT with the HF API
encoder = TFBertModel.from_pretrained('path/to/bert_dir/', from_pt=True)

# Build model composed with BERT
# Input layers (from the tokenizer)
input_ids = tf.keras.layers.Input(shape=(None,), dtype=tf.int32, name='input_ids')
token_type_ids = tf.keras.layers.Input(shape=(None,), dtype=tf.int32, name='token_type_ids')
attention_mask = tf.keras.layers.Input(shape=(None,), dtype=tf.int32, name='attention_mask')

# BERT encoder
encoded = encoder({"input_ids": input_ids, 
                   "token_type_ids": token_type_ids, 
                   "attention_mask": attention_mask})['pooler_output']
# Classifier head
outputs = tf.keras.layers.Dense(n_classes, activation='softmax', name='classifier')(encoded)

# Build the model
model = tf.keras.models.Model(inputs=[input_ids, token_type_ids, attention_mask], outputs=outputs)

The only issue I faced is that the TFBertModel is not able to load the files from the Tensorflow checkpoints, so you need to load from PyTorch and use from_pt=True

jvanz commented 2 years ago

I believe this example should be in the README as an example of how to use it with Tensorflow. ;)