Error on GPT2-model: "generating special token with probability 1"

Using a blank install, I can't get the GPT2 model to work nicely. Maybe someone had the same problems. Seeing that the online bot also does not work may indicate that something is wrong with the pretrained model

My Setup:

Windows 10
fresh conda env using python=3.6*
Also tried creating a interpreter in wsl1, which lead to the same error

Bug description:

When interacting with the Bot, I get this warning, that apparently there is a special token generated with probabilty 1. After that, no interaction gets done. I checked the history list and you can see my word embeddings in the history (its not empty)

See:

>>> How are you?

C:/Users/nano/Documents/repos/transfer-learning-conv-ai/interact.py:81: UserWarning: Warning: model generating special token with probability 1.
  warnings.warn("Warning: model generating special token with probability 1.")
>>> How are you now?

Full Scrollback:


C:\Users\nano\.conda\envs\convAI\python.exe -- C:/Users/nano/Documents/repos/transfer-learning-conv-ai/interact.py --model gpt2 --model_checkpoint gpt2
2021-01-18 16:57:56.320089: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-01-18 16:57:56.320235: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
INFO:C:/Users/nano/Documents/repos/transfer-learning-conv-ai/interact.py:Namespace(dataset_cache='./dataset_cache', dataset_path='', device='cpu', max_history=2, max_length=20, min_length=1, model='gpt2', model_checkpoint='gpt2', no_sample=False, seed=0, temperature=0.7, top_k=0, top_p=0.9)
INFO:C:/Users/nano/Documents/repos/transfer-learning-conv-ai/interact.py:Get pretrained model and tokenizer
INFO:transformers.tokenization_utils:loading file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-vocab.json from cache at C:\Users\nano\.cache\torch\transformers\f2808208f9bec2320371a9f5f891c184ae0b674ef866b79c58177067d15732dd.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71
INFO:transformers.tokenization_utils:loading file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-merges.txt from cache at C:\Users\nano\.cache\torch\transformers\d629f792e430b3c76a1291bb2766b0a047e36fae0588f9dbc1ae51decdff691b.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
INFO:transformers.configuration_utils:loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-config.json from cache at C:\Users\nano\.cache\torch\transformers\4be02c5697d91738003fb1685c9872f284166aa32e061576bbe6aaeb95649fcf.db13c9bc9c7bdd738ec89e069621d88e05dc670366092d809a9cbcac6798e24e
INFO:transformers.configuration_utils:Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "do_sample": false,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "eos_token_ids": null,
  "finetuning_task": null,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1"
  },
  "initializer_range": 0.02,
  "is_decoder": false,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1
  },
  "layer_norm_epsilon": 1e-05,
  "length_penalty": 1.0,
  "max_length": 20,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_layer": 12,
  "n_positions": 1024,
  "num_beams": 1,
  "num_labels": 2,
  "num_return_sequences": 1,
  "output_attentions": false,
  "output_hidden_states": false,
  "output_past": true,
  "pad_token_id": null,
  "pruned_heads": {},
  "repetition_penalty": 1.0,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "temperature": 1.0,
  "top_k": 50,
  "top_p": 1.0,
  "torchscript": false,
  "use_bfloat16": false,
  "vocab_size": 50257
}

INFO:transformers.modeling_utils:loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin from cache at C:\Users\nano\.cache\torch\transformers\4295d67f022061768f4adc386234dbdb781c814c39662dd1662221c309962c55.778cf36f5c4e5d94c8cd9cefcf2a580c8643570eb327f0d4a1f007fab2acbdf1
INFO:transformers.tokenization_utils:Adding <bos> to the vocabulary
INFO:transformers.tokenization_utils:Assigning <bos> to the bos_token key of the tokenizer
INFO:transformers.tokenization_utils:Adding <eos> to the vocabulary
INFO:transformers.tokenization_utils:Assigning <eos> to the eos_token key of the tokenizer
INFO:transformers.tokenization_utils:Adding <pad> to the vocabulary
INFO:transformers.tokenization_utils:Assigning <pad> to the pad_token key of the tokenizer
INFO:transformers.tokenization_utils:Adding <speaker1> to the vocabulary
INFO:transformers.tokenization_utils:Adding <speaker2> to the vocabulary
INFO:transformers.tokenization_utils:Assigning ['<speaker1>', '<speaker2>'] to the additional_special_tokens key of the tokenizer
INFO:C:/Users/nano/Documents/repos/transfer-learning-conv-ai/interact.py:Sample a personality
INFO:C:\Users\nano\Documents\repos\transfer-learning-conv-ai\utils.py:Load tokenized dataset from cache at ./dataset_cache_GPT2Tokenizer
INFO:C:/Users/nano/Documents/repos/transfer-learning-conv-ai/interact.py:Selected personality: i love to drink wine and dance in the moonlight.i remember when nobody had a television.i am very strong for my age.i feel like i might live forever.i am 100 years old.
>>> How are you?

C:/Users/nano/Documents/repos/transfer-learning-conv-ai/interact.py:81: UserWarning: Warning: model generating special token with probability 1.
  warnings.warn("Warning: model generating special token with probability 1.")
>>> How are you now?

>>>

huggingface / transfer-learning-conv-ai

Error on GPT2-model: "generating special token with probability 1" #100