Closed jeswan closed 4 years ago
Comment by sleepinyourhat Wednesday Apr 29, 2020 at 14:43 GMT
@zphang, any guesses (since you were recently working on XLM-R)?
FWIW, it seems odd that the log is shoing the tokenizer name "XLM_en"—the name that appears in our code is the lowercase "xlm_en":
https://github.com/nyu-mll/jiant/search?q=XLM_en&unscoped_q=XLM_en
Issue by tejasvi96 Tuesday Apr 28, 2020 at 07:34 GMT Originally opened as https://github.com/nyu-mll/jiant/issues/1079
I was trying to use XLM model as an input_module for some language modelling task. I am getting this error-
04/28 12:46:07 PM: Fatal error in main(): Traceback (most recent call last): File "main.py", line 16, in
main(sys.argv[1:])
File "D:\My\jiant\jiant__main.py", line 588, in main
phase="pretrain",
File "D:\My\jiant\jiant\trainer.py", line 579, in train
output_dict = self._forward(batch, task=task)
File "D:\My\jiant\jiant\trainer.py", line 1043, in _forward
model_out = self._model.forward(task, batch)
File "D:\My\jiant\jiant\models.py", line 865, in forward
out = self._single_sentence_forward(batch, task, predict)
File "D:\My\jiant\jiant\models.py", line 937, in _single_sentence_forward
word_embs_in_context, sent_mask = self.sent_encoder(batch["input1"], task)
File "C:\Users\Tejasvi\Anaconda3\envs\jiant\lib\site-packages\torch\nn\modules\module.py", line 493, in call
result = self.forward(*input, kwargs)
File "D:\My\jiant\jiant\modules\sentence_encoder.py", line 93, in forward
word_embs_in_context = self._highway_layer(self._text_field_embedder(sent))
File "C:\Users\Tejasvi\Anaconda3\envs\jiant\lib\site-packages\torch\nn\modules\module.py", line 493, in call
result = self.forward(*input, *kwargs)
File "D:\My\jiant\jiant\huggingface_transformers_interface\modules.py", line 719, in forward
ids, input_mask = self.correct_sent_indexing(sent)
File "D:\My\jiant\jiant\huggingface_transformers_interface\modules.py", line 107, in correct_sent_indexing
), "transformers cannot find correcpondingly tokenized input"
AssertionError: transformers cannot find correcpondingly tokenized input
Traceback (most recent call last):
File "main.py", line 27, in
raise e # re-raise exception, in case debugger is attached.
File "main.py", line 16, in
main(sys.argv[1:])
File "D:\My\jiant\jiant__main.py", line 588, in main
phase="pretrain",
File "D:\My\jiant\jiant\trainer.py", line 579, in train
output_dict = self._forward(batch, task=task)
File "D:\My\jiant\jiant\trainer.py", line 1043, in _forward
model_out = self._model.forward(task, batch)
File "D:\My\jiant\jiant\models.py", line 865, in forward
out = self._single_sentence_forward(batch, task, predict)
File "D:\My\jiant\jiant\models.py", line 937, in _single_sentence_forward
word_embs_in_context, sent_mask = self.sent_encoder(batch["input1"], task)
File "C:\Users\Tejasvi\Anaconda3\envs\jiant\lib\site-packages\torch\nn\modules\module.py", line 493, in call__
result = self.forward( input, kwargs)
File "D:\My\jiant\jiant\modules\sentence_encoder.py", line 93, in forward
word_embs_in_context = self._highway_layer(self._text_field_embedder(sent))
File "C:\Users\Tejasvi\Anaconda3\envs\jiant\lib\site-packages\torch\nn\modules\module.py", line 493, in call__
result = self.forward(*input, **kwargs)
File "D:\My\jiant\jiant\huggingface_transformers_interface\modules.py", line 719, in forward
ids, input_mask = self.correct_sent_indexing(sent)
File "D:\My\jiant\jiant\huggingface_transformers_interface\modules.py", line 107, in correct_sent_indexing
), "transformers cannot find correcpondingly tokenized input"
AssertionError: transformers cannot find correcpondingly tokenized input
On printing the output of the sent from here - Github
I get for sent: {'words': tensor([[ 2, 19, 5, 119, 6, 1, 10, 137, 48, 3], [ 2, 18, 5, 387, 908, 162, 8, 87, 13, 3], [ 2, 74, 245, 141, 5, 1, 886, 44, 1, 3], [ 2, 20, 1, 1, 7, 1, 435, 24, 14, 3]])} for self.tokenizer_required XLM_en
I have used these settings in the tutorial.conf file exp_name = jiant-demo run_name = mtl-sst-mrpc
random_seed = 42
load_model = 0 reload_tasks = 0 reload_indexing = 0 reload_vocab = 0
pretrain_tasks = "sst" target_tasks = "sts-b" classifier = log_reg classifier_hid_dim = 32 max_seq_len = 33 max_word_v_size = 8000 pair_attn = 0
input_module = xlm-mlm-en-2048 d_word = 300
What could be the possible reason for the same.?