huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.94k stars 26.79k forks source link

How to get .ckpt files for tensorflow DistilBERT model #2668

Closed JKP0 closed 4 years ago

JKP0 commented 4 years ago

model.save_pretrained('dir') tf_model.h5 how to get .ckpt files for it

Poaz commented 4 years ago

Hello JKP0,

What do you need the .ckpt files for?

JKP0 commented 4 years ago

@Poaz Dear, We are working on NLG models for coreference resolution. We started our project with BERT, so our implementations are dependent with the pre-trained BERT-model available from Google-API. Now we want to do study for the same with DistilBERT. Our implementation is based on TensorFlow 1.14.0

Actually our requirement is something like bellow

assignment_map, initialized_variable_names = modeling.get_assignment_map_from_checkpoint(tvars, config['tf_checkpoint']) # essential, unresolved 

init_from_checkpoint = tf.train.init_from_checkpoint if config['init_checkpoint'].endswith('ckpt') else load_from_pytorch_checkpoint  # essential, unresolved 

model.get_all_encoder_layers()  # this is our essential, right now completely unresolved for us
model.get_sequence_output()   # this is our essential, right now completely unresolved for us

but any method (e.g. get_all_encoder_layers(); get_sequence_output(); get_assignment_map_from_checkpoint(); ...) implemented in DistilBertModel class to get this kind of thing is out-of my knowledge. I have checked a loat. In our earlier implementation, we have defined this method where we have used tf.train.list_variables(init_checkpoint) and other tf-1 API to meet the need for which .ckpt files are essential.

And most of the tf-1 API uses checkpoint configuration (or serialized object), but we are unable to resolve it with the non-sequential .h5 model file by TFDistiBertModel. So we are in need to the same file for DistilBert which provided here for BERT.

If you or anyone can suggest a way to come out from it or possible convenient way to get .ckpt files for DistilBERT, I have lots of thanks in advance. Thanks!

Poaz commented 4 years ago

Okay, thanks for the context. If you in anyway able to use PyTorch for your implementation you can get outputs from all layers using the following code:

from transformers import DistilBertTokenizer, DistilBertModel, DistilBertConfig
import torch

config = DistilBertConfig.from_pretrained('distilbert-base-uncased', output_hidden_states=True)
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBertModel.from_pretrained('distilbert-base-uncased', config=config)
model.eval()
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) 
outputs = model(input_ids)

The output will then be outputs[0] (batch_size, seq_length, hidden_state) for the final layer and outputs[1] (batch_size, seq_length, hidden_state) for each layer in the model, with index 0 being the last layer.

If that is not an option, it is possible to convert the .h5 file to .ckpt using Keras and Tensorflow

For tf 1.x

saver = tf.train.Saver()
model = keras.models.load_model("model.h5")
sess = keras.backend.get_session()
save_path = saver.save(sess, "model.ckpt")

for tf 2.x

saver = tf.train.Checkpoint()
model = keras.models.load_model('model.hdf5', compile=False)
sess = tf.compat.v1.keras.backend.get_session()
save_path = saver.save('model.ckpt')

Hope it helps!

JKP0 commented 4 years ago

@Poaz your first idea is good, but it will cost us for other changes. And second one giving error we have tried a lot, as DistilBERT model saved by model.save_pretrained('dir') is not a sequential or serialized object and keras.models.load_model("model.h5") only loads sequential and serialized .h5 model.

to save model


import tensorflow as tf
from transformers import DistilBertTokenizer, TFDistilBertModel

tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased') model = TFDistilBertModel.from_pretrained('distilbert-base-uncased') input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"), dtype="int32")[None, :] # Batch size 1 outputs = model(input_ids) last_hidden_states = outputs[0]

model.save_pretrained("./DSB/") model.save_weights("./DSB/DistDistilBERT_weights.h5")


> tf-1.14.0

import tensorflow as tf from keras.models import load_model

saver = tf.train.Saver() model = keras.models.load_model("DSB/tf_model.h5") sess = keras.backend.get_session() save_path = saver.save(sess, "/tmp/model.ckpt")

>
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-01f1268a6c60> in <module>()
----> 1 saver = tf.train.Saver()
      2 model = load_model("DSB/tf_model.h5")
      3 sess = keras.backend.get_session()
      4 save_path = saver.save(sess, "model.ckpt")

2 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py in __init__(self, var_list, reshape, sharded, max_to_keep, keep_checkpoint_every_n_hours, name, restore_sequentially, saver_def, builder, defer_build, allow_empty, write_version, pad_step_number, save_relative_paths, filename)
    823           time.time() + self._keep_checkpoint_every_n_hours * 3600)
    824     elif not defer_build:
--> 825       self.build()
    826     if self.saver_def:
    827       self._check_saver_def()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py in build(self)
    835     if context.executing_eagerly():
    836       raise RuntimeError("Use save/restore instead of build in eager mode.")
--> 837     self._build(self._filename, build_save=True, build_restore=True)
    838 
    839   def _build_eager(self, checkpoint_path, build_save, build_restore):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py in _build(self, checkpoint_path, build_save, build_restore)
    860           return
    861         else:
--> 862           raise ValueError("No variables to save")
    863       self._is_empty = False
    864 

ValueError: No variables to save

> tf-2.0.0

import tensorflow as tf from tensorflow.keras.models import load_model

saver = tf.train.Checkpoint() model = load_model('DSB/tf_model.h5', compile=False) sess = tf.compat.v1.keras.backend.get_session() save_path = saver.save('model.ckpt')

>
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-13dd44da36a5> in <module>()
      1 saver = tf.train.Checkpoint()
----> 2 model = load_model('DSB/tf_model.h5', compile=False)
      3 sess = tf.compat.v1.keras.backend.get_session()
      4 save_path = saver.save('model.ckpt')

1 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py in load_model(filepath, custom_objects, compile)
    144   if (h5py is not None and (
    145       isinstance(filepath, h5py.File) or h5py.is_hdf5(filepath))):
--> 146     return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile)
    147 
    148   if isinstance(filepath, six.string_types):

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py in load_model_from_hdf5(filepath, custom_objects, compile)
    163     model_config = f.attrs.get('model_config')
    164     if model_config is None:
--> 165       raise ValueError('No model found in config file.')
    166     model_config = json.loads(model_config.decode('utf-8'))
    167     model = model_config_lib.model_from_config(model_config,

ValueError: No model found in config file.

> tf-2.0.0

import tensorflow as tf from keras.models import load_model

saver = tf.train.Checkpoint() model = load_model('DSB/tf_model.h5', compile=False) sess = tf.compat.v1.keras.backend.get_session() save_path = saver.save('model.ckpt')


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-13dd44da36a5> in <module>()
      1 saver = tf.train.Checkpoint()
----> 2 model = load_model('DSB/tf_model.h5', compile=False)
      3 sess = tf.compat.v1.keras.backend.get_session()
      4 save_path = saver.save('model.ckpt')

3 frames
/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py in load_wrapper(*args, **kwargs)
    456                 os.remove(tmp_filepath)
    457             return res
--> 458         return load_function(*args, **kwargs)
    459 
    460     return load_wrapper

/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py in load_model(filepath, custom_objects, compile)
    548     if H5Dict.is_supported_type(filepath):
    549         with H5Dict(filepath, mode='r') as h5dict:
--> 550             model = _deserialize_model(h5dict, custom_objects, compile)
    551     elif hasattr(filepath, 'write') and callable(filepath.write):
    552         def load_function(h5file):

/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py in _deserialize_model(h5dict, custom_objects, compile)
    237         return obj
    238 
--> 239     model_config = h5dict['model_config']
    240     if model_config is None:
    241         raise ValueError('No model found in config.')

/usr/local/lib/python3.6/dist-packages/keras/utils/io_utils.py in __getitem__(self, attr)
    316             else:
    317                 if self.read_only:
--> 318                     raise ValueError('Cannot create group in read-only mode.')
    319                 val = H5Dict(self.data.create_group(attr))
    320         return val

ValueError: Cannot create group in read-only mode.
Poaz commented 4 years ago

I see.. The h5 does not contain the model structure, therefore it can not be recreated. That means that it is necessary to rebuild the model in Keras for that method to work. That is simply not feasible for you I think.

divyag11 commented 4 years ago

hey,you can load the model as : loaded_model = TFDistilBertForSequenceClassification.from_pretrained("directory")

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

varunSabnis commented 2 years ago

@JKP0 were u able to solve the issue?

kusumlata123 commented 2 years ago

How did you solve this problem, can any one help in this. How to get .ckpt files for muril-base-cased/tf_model.h5