huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.05k stars 26.3k forks source link

TF GPT2 Language model can't be created with from_pretrained() for specific shortcut name #3200

Closed bilal2vec closed 4 years ago

bilal2vec commented 4 years ago

🐛 Bug

Information

Model I am using (Bert, XLNet ...): TFGPT2LMHeadModel

The colab notebook works for all model sizes except for gpt2-xl, where it throws an error. It looks like it can't download the correct checkpoint from the model name (gpt2-xl)

I tried running the colab notebook with other gpt2-models and they all work.

Stack trace:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-068b0d38bee3> in <module>()
      1 strategy = tf.distribute.experimental.TPUStrategy(resolver)
      2 with strategy.scope():
----> 3   model = create_model()
      4 
      5   loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

2 frames
<ipython-input-7-f6b9ea32b94a> in create_model()
      1 def create_model():
----> 2   return TFGPT2LMHeadModel.from_pretrained('gpt2-xl')

/usr/local/lib/python3.6/dist-packages/transformers/modeling_tf_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    401         model(model.dummy_inputs, training=False)  # build the network with dummy inputs
    402 
--> 403         assert os.path.isfile(resolved_archive_file), "Error retrieving file {}".format(resolved_archive_file)
    404         # 'by_name' allow us to do transfer learning by skipping/adding layers
    405         # see https://github.com/tensorflow/tensorflow/blob/00fad90125b18b80fe054de1055770cfb8fe4ba3/tensorflow/python/keras/engine/network.py#L1339-L1357

/usr/lib/python3.6/genericpath.py in isfile(path)
     28     """Test whether a path is a regular file"""
     29     try:
---> 30         st = os.stat(path)
     31     except OSError:
     32         return False

TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Language I am using the model on (English, Chinese ...): English

The problem arises when using:

See colab: https://colab.research.google.com/drive/12gEGdxUjyVLBSUjkjngAWiE_ENIUIV8o

The tasks I am working on is:

Finetuning gpt2-xl on wikitext2

To reproduce

Run the colab notebook,

Expected behavior

All gpt2 model sizes work except for gpt2-xl

Environment info

bilal2vec commented 4 years ago

For some reason there isn't a TF pretrained checkpoint for gpt2-xl here but there is for Pytorch here

Fixing this should only involve converting the pt checkpoint to a tf one. I'd be happy to do it myself if there is a conversion script that can convert Pytorch checkpoints to TF

bilal2vec commented 4 years ago

Converting a pytorch checkpoint to tf works with

model = GPT2LMHeadModel.from_pretrained('gpt2-xl')
model.save_pretrained('./')
model = TFGPT2LMHeadModel.from_pretrained('./', from_pt=True)
model.save_pretrained('./out')

If you can tell me where to upload the TF checkpoint to, I'll open up a pull request

patrickvonplaten commented 4 years ago

Hi @bkkaggle thanks for pointing this out! @julien-c could you maybe help out here:

While the model:

"gpt2-xl": "https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-xl-pytorch_model.bin",

does exist in PyTorch. It does not exist for TF 2. Could we add it as well?