Closed DarshanDeshpande closed 3 years ago
Hello!
As first, I can see several issues on the way you want to train the model:
tokenize
function, the first element of the tuple (a.ids
) is taken as the input, and the second (a.attention_mask
) is taken as the label. Hence the error you get.tf.keras.models.Model
you define the inputs
and the outputs
to be the same, this is not correct either, you have to run the model once and then give this output.@jplu I realized my mistake and I changed the code to this
def tokenize(sentence):
sentence = sentence.numpy().decode('utf-8')
a = tokenizer.encode(sentence)
return tf.constant(a.ids,tf.int32), tf.constant(a.attention_mask, tf.int32)
def get_tokenized(sentence):
return tf.py_function(tokenize, inp=[sentence], Tout=[tf.int32,tf.int32])
def get_tokenized_final(a,b):
return (a,b), None
dataset = tf.data.Dataset.from_tensor_slices(lines)
dataset = dataset.map(get_tokenized, num_parallel_calls=tf.data.AUTOTUNE).map(get_tokenized_final, num_parallel_calls=tf.data.AUTOTUNE)
import tensorflow as tf
config = DistilBertConfig(vocab_size=30000)
model = TFDistilBertForMaskedLM(config)
inp1 = tf.keras.layers.Input(shape=(128,), dtype=tf.int32, name="input_ids")
inp2 = tf.keras.layers.Input(shape=(128,), dtype=tf.int32, name="attention_mask")
op = model([inp1,inp2])
model = tf.keras.models.Model(inputs=[inp1, inp2], outputs=model.output)
Now the model throws two warnings
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
and then throws the final error
ValueError: in user code:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:805 train_function *
return step_function(self, iterator)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:795 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:1259 run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2730 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:3417 _call_for_each_replica
return fn(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:788 run_step **
outputs = model.train_step(data)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:757 train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:498 minimize
return self.apply_gradients(grads_and_vars, name=name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:598 apply_gradients
grads_and_vars = optimizer_utils.filter_empty_gradients(grads_and_vars)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/utils.py:79 filter_empty_gradients
([v.name for _, v in grads_and_vars],))
ValueError: No gradients provided for any variable: ['tf_distil_bert_for_masked_lm_1/distilbert/embeddings/word_embeddings/weight:0', 'tf_distil_bert_for_masked_lm_1/distilbert/embeddings/position_embeddings/embeddings:0', 'tf_distil_bert_for_masked_lm_1/distilbert/embeddings/LayerNorm/gamma:0', 'tf_distil_bert_for_masked_lm_1/distilbert/embeddings/LayerNorm/beta:0', 'tf_distil_bert_for_masked_lm_1/distilbert/transformer/layer_._0/attention/q_lin/kernel:0', 'tf_distil_bert_for_masked_lm_1/distilbert/transformer/layer_._0/attention/q_lin/bias:0', 'tf_distil_bert_for_masked_lm_1/distilbert/transformer/layer_._0/attention/k_lin/kernel:0', 'tf_distil_bert_for_masked_lm_1/distilbert/transformer/layer_._0/attention/k_lin/bias:0', 'tf_distil_bert_for_masked_lm_1/distilbert/transformer/layer_._0/attention/v_lin/kernel:0', 'tf_distil_bert_for_masked_lm_1/distilbert/transformer/layer_._0/attention/v_lin/bias:0', 'tf_distil_bert_for_masked_lm_1/distilbert/transformer/layer_._0/attention/out_lin/kernel:0', 'tf_distil_bert_for_masked_lm_1/distilbert/transformer/layer_._0/attention/out_lin/bias:0', 'tf_distil_bert_for_masked_lm_1/distilbert/transformer/layer_._0/sa_layer_norm/gamma:0', 'tf_distil_bert_for_masked_lm_1/distilbert/transformer/layer_._0/sa_layer_norm/beta:0', 'tf_distil_bert_for_masked_lm_1/distilbert/transformer/layer_._0/ffn/lin1/kernel:0', 'tf_distil_bert_for_masked_lm_1/distilbert/transformer/layer_._0/ffn/lin1/bias:0', 'tf_distil_bert_for_masked_lm_1/distilbert/tra...
Any idea what I am doing wrong?
You cannot do model.output
, as said in my previous message you have to run the model once to get how the output looks like :)
@jplu Could you tell me exactly what you mean by "run" the model? If I pass a sample array with all ones, it gives me a Broadcasting error as follows
config = DistilBertConfig(vocab_size=30000)
model = TFDistilBertForMaskedLM(config)
inp1 = tf.keras.layers.Input(shape=(128,), dtype=tf.int32, name="input_ids")
inp2 = tf.keras.layers.Input(shape=(128,), dtype=tf.int32, name="attention_mask")
_ = model([inp1,inp2])
# Error is thrown for this call
a = tf.ones((128,),dtype=tf.int32)
model((a,a))
Error is as attached
InvalidArgumentError: Incompatible shapes: [512,768] vs. [128,768] [Op:BroadcastTo]
More specifically the error is raised in modeling_tf_distilbert.py
183 if position_ids is None:
--> 184 position_embeds = self.position_embeddings(position_ids=inputs_embeds)
185 else:
186 position_embeds = self.position_embeddings(position_ids=position_ids)
If by "run" you mean calling fit on the model then it raises the same gradient error
Here a dummy example:
import tensorflow as tf
from transformers import TFDistilBertForMaskedLM, DistilBertTokenizer, DistilBertConfig
config = DistilBertConfig(vocab_size=30000)
model = TFDistilBertForMaskedLM(config)
inp1 = tf.keras.layers.Input(shape=(128,), dtype=tf.int32, name="input_ids")
inp2 = tf.keras.layers.Input(shape=(128,), dtype=tf.int32, name="attention_mask")
output = model([inp1,inp2])
model = tf.keras.models.Model(inputs=[inp1,inp2], outputs=[output])
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
data = tokenizer(["Hello1", "Hello2", "Hello3"], truncation=True, max_length=128, padding="max_length", return_tensors="tf")
labels = tf.ones((3, 128), dtype=tf.int32)
X = tf.data.Dataset.from_tensor_slices((dict(data), labels)).batch(1)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(loss=loss, optimizer="adam")
model.fit(X, epochs=1)
@jplu Thanks for this but this tokenizes the data and then loads it as a tf.data.Dataset. I was looking for an implementation where the tokenization can be integrated in the pipeline itself and can be done on the fly. I found this issue on tensorflow but there are no fixes for it yet. Do you have any idea how to do this because my dataset is big enough to fit in colab memory but cannot be fully tokenized in memory?
Sorry you cannot do this.
Okay. Thanks for all the help!
Environment info
transformers
version: 4.3.2Who can help
@jplu
Information
Model I am using (Bert, XLNet ...): TFDistilBert
The problem arises when using:
The tasks I am working on is:
To reproduce
Steps to reproduce the behavior:
Error:
I have cross checked the output shape and input dimensions. If this is not the correct way then how exactly do I train a TF DistilBert model from scratch?
Expected behavior
Training should start as soon as fit is called