yuanzhoulvpi2017 / zero_nlp

中文nlp解决方案(大模型、数据、模型、训练、推理)
MIT License
2.95k stars 363 forks source link

执行sh脚本报错IndexError: Out of range: piece id is out of range. #72

Closed janglichao closed 1 year ago

janglichao commented 1 year ago

raceback (most recent call last): File "/home/kidd/projects/llms/chatGLM-6B/ChatGLM-6B/zero_nlp/Chatglm6b_ModelParallel_ptuning/main_parallel.py", line 450, in main() File "/home/kidd/projects/llms/chatGLM-6B/ChatGLM-6B/zero_nlp/Chatglm6b_ModelParallel_ptuning/main_parallel.py", line 277, in main print_dataset_example(train_dataset[0]) File "/home/kidd/projects/llms/chatGLM-6B/ChatGLM-6B/zero_nlp/Chatglm6b_ModelParallel_ptuning/main_parallel.py", line 257, in print_dataset_example print("inputs", tokenizer.decode(example["input_ids"])) File "/home/kidd/anaconda3/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3476, in decode return self._decode( File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 268, in _decode return self.sp_tokenizer.decode(token_ids) File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 117, in decode text = self._get_text_tokenizer().decode(ids) File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 29, in decode return self.sp.DecodeIds(ids) File "/home/kidd/anaconda3/lib/python3.10/site-packages/sentencepiece/init.py", line 837, in DecodeIds return self.Decode(input=input, out_type=out_type, **kwargs) File "/home/kidd/anaconda3/lib/python3.10/site-packages/sentencepiece/init.py", line 780, in Decode return self._DecodeIds(input) File "/home/kidd/anaconda3/lib/python3.10/site-packages/sentencepiece/init.py", line 337, in _DecodeIds return _sentencepiece.SentencePieceProcessor__DecodeIds(self, ids) IndexError: Out of range: piece id is out of range.

3090*2的配置,好像也不像是cuda溢出

janglichao commented 1 year ago

解决了,是print_dataset_example有问题,如果debug建议这个函数注释掉不输出,我自己的训练集可能数组比较大,溢出了