huggingface / pytorch-openai-transformer-lm

🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI
MIT License
1.51k stars 285 forks source link

Vocabulary size code explanation and occasionally shape error #38

Closed Vimos closed 6 years ago

Vimos commented 6 years ago

From the model definition, vocab is used to define the size of embedding.

   vocab = n_vocab + n_special + n_ctx

I am guessing that the n_ctx here is used for the position embedding, but still not clear.

In my case, I sometimes run into the following shape error if n_ctx is very large.

Traceback (most recent call last):
  File "/home/vimos/git/QA/pytorch-openai-transformer-lm/train.py", line 413, in <module>
    load_openai_pretrained_model(dh_model.transformer, n_ctx=n_ctx, n_special=n_special)
  File "/home/vimos/git/QA/pytorch-openai-transformer-lm/model_pytorch.py", line 402, in load_openai_pretrained_model
    assert model.embed.weight.shape == init_params[0].shape
AssertionError: (torch.Size([41140, 768]), (40993, 768))

Can anybody explain the code? Should I restrict n_ctx to a value? Thanks!

rodgzilla commented 6 years ago

n_ctx corresponds to the number of position that can be encoded by the network. In the article, the authors mentioned this:

We train for 100 epochs on minibatches of 64 randomly sampled, contiguous sequences of 512 tokens.

This means that the network do not know how to encode positions after the 512th one so 512 is the maximum value that n_ctx can take.

When using the model all your inputs will be of length n_ctx, you should therefore try to reduce its value as much as possible as it will give you large performance improvements (in terms of training and inference time).

Vimos commented 6 years ago

Oh, I believe that's why n_ctx is used to define the network structure and dynamic length is not working.

Thank you for the explanation!