huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.61k stars 26.92k forks source link

Save model wrapped in Keras #2733

Closed aollagnier closed 3 years ago

aollagnier commented 4 years ago

Hi all,

Sorry for my naive question but I am trying to save my keras model (<class 'tensorflow.python.keras.engine.training.Model'>) in which I use TFBertModel() function as an hidden layer. To do that I use the save() function provided by the tf.keras package.

But I got this error:

---------------------------------------------------------------------------

NotImplementedError                       Traceback (most recent call last)

<ipython-input-13-3b315f7219da> in <module>()
----> 1 model.save('model_weights.h5')

8 frames

/tensorflow-2.1.0/python3.6/tensorflow_core/python/keras/engine/network.py in get_config(self)
    915   def get_config(self):
    916     if not self._is_graph_network:
--> 917       raise NotImplementedError
    918     return copy.deepcopy(get_network_config(self))
    919 

NotImplementedError: 

The error can be reproduce from my colab : https://colab.research.google.com/drive/18HYwffkXCylPqeA-8raL82vfwOjb-aLP

And another question is how should I call this model for prediction ?

Thx for your help!

sainimohit23 commented 4 years ago

Same problem.

LysandreJik commented 4 years ago

On which version are you running? Is it possible that this fix fixed your issue? Can you try installing from master to check?

gthb commented 4 years ago

This doesn't look like the same thing I was fixing in #3103 so I doubt that that helped.

gthb commented 4 years ago

In particular, from Network docstring:

  Two types of `Networks` exist: Graph Networks and Subclass Networks. Graph
  networks are used in the Keras Functional and Sequential APIs. Subclassed
  networks are used when a user subclasses the `Model` class. In general,
  more Keras features are supported with Graph Networks than with Subclassed
  Networks, specifically:

  - Model cloning (`keras.models.clone`)
  - Serialization (`model.get_config()/from_config`, `model.to_json()/to_yaml()`
  - Whole-model saving (`model.save()`)

Based on the traceback, apparently the model is a subclass model, so it needs to override get_config in order to support serialization. (The fix in #3103 is for a problem with using TF*MainLayer classes within a Keras model, so it doesn't address this.)

sainimohit23 commented 4 years ago

@gthb so is there any way to save the models wrapped in keras?

gthb commented 4 years ago

@gthb so is there any way to save the models wrapped in keras?

I'm sure there's some way, just a question of how much custom work you have to do (probably some, given the above quote).

But are you sure you need to be using TFBertModel and not TFBertMainLayer, for your hidden layer? TFBertModel is literally just this (plus docstrings):

class TFBertModel(TFBertPreTrainedModel):
    def __init__(self, config, *inputs, **kwargs):
        super().__init__(config, *inputs, **kwargs)
        self.bert = TFBertMainLayer(config, name="bert")

    def call(self, inputs, **kwargs):
        outputs = self.bert(inputs, **kwargs)
        return outputs

... so unless you need something in particular from TFBertModel's superclasses, maybe using TFBertMainLayer directly would simplify things for you?

aollagnier commented 4 years ago

Thanks @gthb for your reply. I've updated my colab and now it works after I changed the following line:

model=TFBertModel.from_pretrained('bert-base-cased', config=config)

to: model=TFBertMainLayer(config=config)

however I can't call the function from_pretrained. Is the class implicitly set by providing the config options from BERTConfig ?

Another point, I am facing a problem during the training of the model when it wraps in keras. Using: embedding = model([word_inputs, mask_inputs, seg_inputs])[0] I get: tensorflow:Gradients do not exist for variables ['tf_bert_main_layer/pooler/dense/kernel:0', 'tf_bert_main_layer/pooler/dense/bias:0'] when minimizing the loss.

I would like to use layers from transformers combined with a CNN (require 3D tensors as input) but in order to keep weights learned by the model I tried the pooler output (which provides 2D tensors): model([word_inputs, mask_inputs, seg_inputs])[1] but it doesn't fit with CNN: ValueError: Input 0 of layer input is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 768]

Do you have an idea how I should reshape it to fit with a conv1D layer ? The error can be reproduce from my colab : https://colab.research.google.com/drive/18HYwffkXCylPqeA-8raL82vfwOjb-aLP

gthb commented 4 years ago

I can't call the function from_pretrained. Is the class implicitly set by providing the config options from BERTConfig ?

I'm guessing you mean that TFBertMainLayer does not have a from_pretrained method. Yep, but BertConfig does, so this works:

from transformers import BertConfig, TFBertMainLayer
config_name = "bert-base-uncased"  # for instance
config = BertConfig.from_pretrained(config_name)
main_layer = TFBertMainLayer(config)

Do you have an idea how I should reshape it to fit with a conv1D layer ?

Isn't your Conv1D layer intended to convolve over the token sequence? The pooled output produces a single vector representing the whole sequence, not separate vectors for each token of the sequence. So you are probably mistaken in trying to use the pooled output (or I'm not understanding your intent).

aollagnier commented 4 years ago

Yes you've right I've misunderstood the nature of the pooler output (probably I've been misleaded by these related topics:#2256 and #1727). So when I am using the last_hidden_state I am getting this warning: tensorflow:Gradients do not exist for variables ['tf_bert_main_layer/pooler/dense/kernel:0', 'tf_bert_main_layer/pooler/dense/bias:0'] when minimizing the loss.

but the model seems train however, when I load it I am getting:

 File "/home/X/", line 69, in train
    loaded_model = tf.keras.models.load_model(dirModel+self.options.t+'cnn.h5')
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py", line 146, in load_model
    return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 193, in load_model_from_hdf5
    model._make_train_function()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 2057, in _make_train_function
    params=self._collected_trainable_weights, loss=self.total_loss)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py", line 503, in get_updates
    grads = self.get_gradients(loss, params)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py", line 397, in get_gradients
    "K.argmax, K.round, K.eval.".format(param))
ValueError: Variable <tf.Variable 'tf_bert_main_layer_1/pooler/dense/kernel:0' shape=(768, 768) dtype=float32> has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval. 

Here, the model used:

    # Define inputs
    word_inputs = tf.keras.layers.Input(shape=(max_seq_length,), name='word_inputs', dtype='int32')
    mask_inputs = tf.keras.layers.Input(shape=(max_seq_length,), name='mask_inputs', dtype='int32')
    seg_inputs = tf.keras.layers.Input(shape=(max_seq_length,), name='seg_inputs', dtype='int32')

    # Call BERT model
    config_name = "bert-base-uncased"  # for instance
    config = BertConfig.from_pretrained(config_name)
    main_layer = TFBertMainLayer(config)
    embedding = model([word_inputs, mask_inputs, seg_inputs])[0]

    conv=tf.keras.layers.Conv1D(128, kernel_size=5, activation='relu', name="input")(embedding)
    pooling = tf.keras.layers.MaxPooling1D()(conv)
    lstm = tf.keras.layers.LSTM(128)(pooling)
    dense = tf.keras.layers.Dense(64, activation='relu')(lstm)

    # Final output 
    outputs = tf.keras.layers.Dense(1, activation='sigmoid', name='outputs')(dense)

    # Compile model
    model = tf.keras.Model(inputs=[word_inputs, mask_inputs, seg_inputs], outputs=outputs)
    model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

    model.save('cnn.h5')
    loaded_model = tf.keras.models.load_model('cnn.h5')

So what's I am doing wrong ?

AleksTk commented 4 years ago

@gthb

... so unless you need something in particular from TFBertModel's superclasses, maybe using TFBertMainLayer directly would simplify things for you?

Simply initializing TFBertMainLayer as

   main_layer = TFBertMainLayer(config)

won't load pretrained parameters as opposed to TFBertModel.from_pretrained(...), right?

gthb commented 4 years ago

won't load pretrained parameters as opposed to TFBertModel.from_pretrained(...), right?

Oops, yes, there's that little thing! 😄 You can load the weights e.g. like this:

bert_weights_file = TFBertPreTrainedModel.pretrained_model_archive_map[config_name]
bert_weights_file = cached_path(bert_weights_file)
model.load_weights(bert_weights_file, by_name=True)
FullDataAlchemist commented 4 years ago

won't load pretrained parameters as opposed to TFBertModel.from_pretrained(...), right?

Oops, yes, there's that little thing! You can load the weights e.g. like this:

bert_weights_file = TFBertPreTrainedModel.pretrained_model_archive_map[config_name]
bert_weights_file = cached_path(bert_weights_file)
model.load_weights(bert_weights_file, by_name=True)

I'm getting this error, using transformers 2.11.0 version :

AttributeError: type object 'TFBertPreTrainedModel' has no attribute 'pretrained_model_archive_map'

I'm using this syntax in my code :

config = BertConfig.from_pretrained(config_name)
bert_weights_file = TFBertPreTrainedModel.pretrained_model_archive_map[config_name]
gthb commented 4 years ago

@PoriNiki yeah, from a quick git log -S pretrained_model_archive_map that attribute went away in https://github.com/huggingface/transformers/pull/4636 “Kill model archive maps” — merged to master in https://github.com/huggingface/transformers/commit/d4c2cb402d6674211726fd5f4803d1090664e438 and first released in v2.11.0.

By staring at TFPreTrainedModel.from_pretrained a bit, the right way ought to be something like:

from transformers.file_utils import hf_bucket_url, TF2_WEIGHTS_NAME
bert_weights_file_url = hf_bucket_url(config_name, filename=TF2_WEIGHTS_NAME)
bert_weights_file = cached_path(bert_weights_file_url)

(not tested)

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ch-hristov commented 4 years ago

I still have this issue. Can't save my model, only saving weight

dmlicht commented 4 years ago

For other people (@ch-hristov) still having trouble with this, I wrote up an explanation and workarounds on stackoverflow: https://stackoverflow.com/questions/62482511/tfbertmainlayer-gets-less-accuracy-compared-to-tfbertmodel/64000378#64000378 It seems like it would be useful to smooth out this workflow, as many people using keras will run into this issue when they try to save their model. @gthb What do you think about adding something like from_pretrained to MainLayer, and pulling out the logic from TFPreTrainedModel.from_pretrained to support both?

gthb commented 4 years ago

Sounds good, but I have just switched jobs and am not using transformers, don't really have the cycles to help, sorry!

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

saboof commented 3 years ago

Hi,

Also encountering this issue, couldn't make the the solution by @dmlicht work yet. Can anyone provide another feedback on that?

Also, will this issue be addressed by the HF team?