salesforce / simpletod

Official repository for "SimpleTOD: A Simple Language Model for Task-Oriented Dialogue"
https://arxiv.org/abs/2005.00796
BSD 3-Clause "New" or "Revised" License
235 stars 79 forks source link

ValueError: Input <|endoftext|> is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers. #3

Open knhaller opened 3 years ago

knhaller commented 3 years ago

When running demo.py with 'gpt2' as the model, I came across this issue:

Loading ModelTraceback (most recent call last): File "demo.py", line 620, in break_tokens = tokenizer.encode(tokenizer._eos_token) + tokenizer.encode('?') + tokenizer.encode('!') File "/balboa/projects/conv_ai/packages/transformers/src/transformers/tokenization_utils_base.py", line 1430, in encode kwargs, File "/balboa/projects/conv_ai/packages/transformers/src/transformers/tokenization_utils_base.py", line 1742, in encode_plus kwargs, File "/balboa/projects/conv_ai/packages/transformers/src/transformers/tokenization_utils.py", line 454, in _encode_plus first_ids = get_input_ids(text) File "/balboa/projects/conv_ai/packages/transformers/src/transformers/tokenization_utils.py", line 442, in get_input_ids f"Input {text} is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers." tmp = b ValueError: Input <|endoftext|> is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.

The error was fixed when I changed line 620 to: break_tokens = tokenizer.encode(tokenizer.eos_token) + tokenizer.encode('?') + tokenizer.encode('!')

fasterbuild commented 3 years ago

tokenizer._eos_token->tokenizer.eos_token