Closed aollagnier closed 3 years ago
Same problem.
On which version are you running? Is it possible that this fix fixed your issue? Can you try installing from master to check?
This doesn't look like the same thing I was fixing in #3103 so I doubt that that helped.
In particular, from Network
docstring:
Two types of `Networks` exist: Graph Networks and Subclass Networks. Graph
networks are used in the Keras Functional and Sequential APIs. Subclassed
networks are used when a user subclasses the `Model` class. In general,
more Keras features are supported with Graph Networks than with Subclassed
Networks, specifically:
- Model cloning (`keras.models.clone`)
- Serialization (`model.get_config()/from_config`, `model.to_json()/to_yaml()`
- Whole-model saving (`model.save()`)
Based on the traceback, apparently the model is a subclass model, so it needs to override get_config
in order to support serialization. (The fix in #3103 is for a problem with using TF*MainLayer
classes within a Keras model, so it doesn't address this.)
@gthb so is there any way to save the models wrapped in keras?
@gthb so is there any way to save the models wrapped in keras?
I'm sure there's some way, just a question of how much custom work you have to do (probably some, given the above quote).
But are you sure you need to be using TFBertModel
and not TFBertMainLayer
, for your hidden layer? TFBertModel
is literally just this (plus docstrings):
class TFBertModel(TFBertPreTrainedModel):
def __init__(self, config, *inputs, **kwargs):
super().__init__(config, *inputs, **kwargs)
self.bert = TFBertMainLayer(config, name="bert")
def call(self, inputs, **kwargs):
outputs = self.bert(inputs, **kwargs)
return outputs
... so unless you need something in particular from TFBertModel
's superclasses, maybe using TFBertMainLayer
directly would simplify things for you?
Thanks @gthb for your reply. I've updated my colab and now it works after I changed the following line:
model=TFBertModel.from_pretrained('bert-base-cased', config=config)
to:
model=TFBertMainLayer(config=config)
however I can't call the function from_pretrained. Is the class implicitly set by providing the config options from BERTConfig ?
Another point, I am facing a problem during the training of the model when it wraps in keras.
Using:
embedding = model([word_inputs, mask_inputs, seg_inputs])[0]
I get:
tensorflow:Gradients do not exist for variables ['tf_bert_main_layer/pooler/dense/kernel:0', 'tf_bert_main_layer/pooler/dense/bias:0'] when minimizing the loss.
I would like to use layers from transformers combined with a CNN (require 3D tensors as input) but in order to keep weights learned by the model I tried the pooler output (which provides 2D tensors): model([word_inputs, mask_inputs, seg_inputs])[1]
but it doesn't fit with CNN:
ValueError: Input 0 of layer input is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 768]
Do you have an idea how I should reshape it to fit with a conv1D layer ? The error can be reproduce from my colab : https://colab.research.google.com/drive/18HYwffkXCylPqeA-8raL82vfwOjb-aLP
I can't call the function from_pretrained. Is the class implicitly set by providing the config options from BERTConfig ?
I'm guessing you mean that TFBertMainLayer
does not have a from_pretrained
method. Yep, but BertConfig
does, so this works:
from transformers import BertConfig, TFBertMainLayer
config_name = "bert-base-uncased" # for instance
config = BertConfig.from_pretrained(config_name)
main_layer = TFBertMainLayer(config)
Do you have an idea how I should reshape it to fit with a conv1D layer ?
Isn't your Conv1D layer intended to convolve over the token sequence? The pooled output produces a single vector representing the whole sequence, not separate vectors for each token of the sequence. So you are probably mistaken in trying to use the pooled output (or I'm not understanding your intent).
Yes you've right I've misunderstood the nature of the pooler output (probably I've been misleaded by these related topics:#2256 and #1727). So when I am using the last_hidden_state I am getting this warning:
tensorflow:Gradients do not exist for variables ['tf_bert_main_layer/pooler/dense/kernel:0', 'tf_bert_main_layer/pooler/dense/bias:0'] when minimizing the loss.
but the model seems train however, when I load it I am getting:
File "/home/X/", line 69, in train
loaded_model = tf.keras.models.load_model(dirModel+self.options.t+'cnn.h5')
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py", line 146, in load_model
return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 193, in load_model_from_hdf5
model._make_train_function()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 2057, in _make_train_function
params=self._collected_trainable_weights, loss=self.total_loss)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py", line 503, in get_updates
grads = self.get_gradients(loss, params)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py", line 397, in get_gradients
"K.argmax, K.round, K.eval.".format(param))
ValueError: Variable <tf.Variable 'tf_bert_main_layer_1/pooler/dense/kernel:0' shape=(768, 768) dtype=float32> has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
Here, the model used:
# Define inputs
word_inputs = tf.keras.layers.Input(shape=(max_seq_length,), name='word_inputs', dtype='int32')
mask_inputs = tf.keras.layers.Input(shape=(max_seq_length,), name='mask_inputs', dtype='int32')
seg_inputs = tf.keras.layers.Input(shape=(max_seq_length,), name='seg_inputs', dtype='int32')
# Call BERT model
config_name = "bert-base-uncased" # for instance
config = BertConfig.from_pretrained(config_name)
main_layer = TFBertMainLayer(config)
embedding = model([word_inputs, mask_inputs, seg_inputs])[0]
conv=tf.keras.layers.Conv1D(128, kernel_size=5, activation='relu', name="input")(embedding)
pooling = tf.keras.layers.MaxPooling1D()(conv)
lstm = tf.keras.layers.LSTM(128)(pooling)
dense = tf.keras.layers.Dense(64, activation='relu')(lstm)
# Final output
outputs = tf.keras.layers.Dense(1, activation='sigmoid', name='outputs')(dense)
# Compile model
model = tf.keras.Model(inputs=[word_inputs, mask_inputs, seg_inputs], outputs=outputs)
model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])
model.save('cnn.h5')
loaded_model = tf.keras.models.load_model('cnn.h5')
So what's I am doing wrong ?
@gthb
... so unless you need something in particular from TFBertModel's superclasses, maybe using TFBertMainLayer directly would simplify things for you?
Simply initializing TFBertMainLayer
as
main_layer = TFBertMainLayer(config)
won't load pretrained parameters as opposed to TFBertModel.from_pretrained(...)
, right?
won't load pretrained parameters as opposed to TFBertModel.from_pretrained(...), right?
Oops, yes, there's that little thing! 😄 You can load the weights e.g. like this:
bert_weights_file = TFBertPreTrainedModel.pretrained_model_archive_map[config_name]
bert_weights_file = cached_path(bert_weights_file)
model.load_weights(bert_weights_file, by_name=True)
won't load pretrained parameters as opposed to TFBertModel.from_pretrained(...), right?
Oops, yes, there's that little thing! You can load the weights e.g. like this:
bert_weights_file = TFBertPreTrainedModel.pretrained_model_archive_map[config_name] bert_weights_file = cached_path(bert_weights_file) model.load_weights(bert_weights_file, by_name=True)
I'm getting this error, using transformers 2.11.0 version :
AttributeError: type object 'TFBertPreTrainedModel' has no attribute 'pretrained_model_archive_map'
I'm using this syntax in my code :
config = BertConfig.from_pretrained(config_name)
bert_weights_file = TFBertPreTrainedModel.pretrained_model_archive_map[config_name]
@PoriNiki yeah, from a quick git log -S pretrained_model_archive_map
that attribute went away in https://github.com/huggingface/transformers/pull/4636 “Kill model archive maps” — merged to master in https://github.com/huggingface/transformers/commit/d4c2cb402d6674211726fd5f4803d1090664e438 and first released in v2.11.0.
By staring at TFPreTrainedModel.from_pretrained
a bit, the right way ought to be something like:
from transformers.file_utils import hf_bucket_url, TF2_WEIGHTS_NAME
bert_weights_file_url = hf_bucket_url(config_name, filename=TF2_WEIGHTS_NAME)
bert_weights_file = cached_path(bert_weights_file_url)
(not tested)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I still have this issue. Can't save my model, only saving weight
For other people (@ch-hristov) still having trouble with this, I wrote up an explanation and workarounds on stackoverflow: https://stackoverflow.com/questions/62482511/tfbertmainlayer-gets-less-accuracy-compared-to-tfbertmodel/64000378#64000378
It seems like it would be useful to smooth out this workflow, as many people using keras will run into this issue when they try to save their model. @gthb What do you think about adding something like from_pretrained
to MainLayer
, and pulling out the logic from TFPreTrainedModel.from_pretrained
to support both?
Sounds good, but I have just switched jobs and am not using transformers, don't really have the cycles to help, sorry!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi,
Also encountering this issue, couldn't make the the solution by @dmlicht work yet. Can anyone provide another feedback on that?
Also, will this issue be addressed by the HF team?
Hi all,
Sorry for my naive question but I am trying to save my keras model (<class 'tensorflow.python.keras.engine.training.Model'>) in which I use TFBertModel() function as an hidden layer. To do that I use the save() function provided by the tf.keras package.
But I got this error:
The error can be reproduce from my colab : https://colab.research.google.com/drive/18HYwffkXCylPqeA-8raL82vfwOjb-aLP
And another question is how should I call this model for prediction ?
Thx for your help!