huawei-noah / Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
3.02k stars 628 forks source link

[TinyBert] ERROR, runing the task_distill during task-specific distill for a chinese task. #32

Closed vigosser closed 3 years ago

vigosser commented 4 years ago

The ERROR happened during task-specific distill, Traceback is in the END. Fine-turn Bert model was generated using transformer package using the bert-base-chinese model, which included in the transformer package.

Is that because the release of TinyBERT's model trained using corpus without Chinese?

Fine-turn command using transformer as follow:

python run_glue.py   --model_type bert   \
--model_name_or_path bert-base-chinese  \
 --task_name sst-2   \
--do_train   \
--do_eval   \
--do_lower_case \
--data_dir /home/vigosser/nvidia/bert/data/final   \
--max_seq_length 128   \
--per_gpu_train_batch_size 8  \
 --learning_rate 15e-6   \
--num_train_epochs 3.0   \
--output_dir /home/vigosser/TinyBERT/FT_bert

Traceback


Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm 2019.2.2\helpers\pydev\pydevd.py", line 2066, in <module>
    main()
  File "C:\Program Files\JetBrains\PyCharm 2019.2.2\helpers\pydev\pydevd.py", line 2060, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm 2019.2.2\helpers\pydev\pydevd.py", line 1411, in run
    return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
  File "C:\Program Files\JetBrains\PyCharm 2019.2.2\helpers\pydev\pydevd.py", line 1418, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm 2019.2.2\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:/github/TinyBERT/task_distill.py", line 1154, in <module>
    main()
  File "D:/github/TinyBERT/task_distill.py", line 1013, in main
    teacher_logits, teacher_atts, teacher_reps = teacher_model(input_ids, segment_ids, input_mask)
  File "C:\Users\vigosser\Anaconda3\envs\vai\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\github\TinyBERT\transformer\modeling.py", line 1133, in forward
    output_all_encoded_layers=True, output_att=True)
  File "C:\Users\vigosser\Anaconda3\envs\vai\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\github\TinyBERT\transformer\modeling.py", line 832, in forward
    embedding_output = self.embeddings(input_ids, token_type_ids)
  File "C:\Users\vigosser\Anaconda3\envs\vai\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\github\TinyBERT\transformer\modeling.py", line 357, in forward
    words_embeddings = self.word_embeddings(input_ids)
  File "C:\Users\vigosser\Anaconda3\envs\vai\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\vigosser\Anaconda3\envs\vai\lib\site-packages\torch\nn\modules\sparse.py", line 114, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Users\vigosser\Anaconda3\envs\vai\lib\site-packages\torch\nn\functional.py", line 1484, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: index out of range: Tried to access index 21397 out of table with 21127 rows. at C:\w\1\s\tmp_conda_3.7_112106\conda\conda-bld\pytorch_1572952932150\work\aten\src\TH/generic/THTensorEvenMoreMath.cpp:418
nlpBeginner commented 4 years ago

you should check your config.json file. the error information suggest that your vocabulary size is 21128, however, image your some inputs("token_id") in "inputs_ids" exceed the vocabulary size.

and Tinybert that we released now, is trained only in english corpus.