yongzhuo / ChatGLM2-SFT

ChatGLM2-6B微调, SFT/LoRA, instruction finetune
Apache License 2.0
107 stars 10 forks source link

IndexError: piece id is out of range. #3

Closed ccdf1137 closed 1 year ago

ccdf1137 commented 1 year ago

File "predict.py", line 196, in predict output = tokenizer.decode(s) File "/home/fumengen6927/chatglm2/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 3476, in decode return self._decode( File "/home/fumengen6927/chatglm2/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 931, in _decode filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens) File "/home/fumengen6927/chatglm2/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 912, in convert_ids_to_tokens tokens.append(self._convert_id_to_token(index)) File "/Work/fme/ChatGLM2-SFT/chatglm2_6b/ft_chatglm2/glm/tokenization_chatglm.py", line 113, in _convert_id_to_token return self.tokenizer.convert_id_to_token(index) File "/Work/fme/ChatGLM2-SFT/chatglm2_6b/ft_chatglm2/glm/tokenization_chatglm.py", line 60, in convert_id_to_token return self.sp_model.IdToPiece(index) File "/home/fumengen6927/chatglm2/lib/python3.8/site-packages/sentencepiece/init.py", line 1045, in _batched_func return _func(self, arg) File "/home/fumengen6927/chatglm2/lib/python3.8/site-packages/sentencepiece/init.py", line 1038, in _func raise IndexError('piece id is out of range.') 解码出问题,这个怎么解决呢

yongzhuo commented 1 year ago

是chatglm对吧, 1.你用官方自带的model.chat(tokenizer, query)试试,别用predict.py的def predict()函数; 2.或者试试tokenizer._decode(ids, skip_special_tokens=True)呢; 3.如果还不行,发s出来看看呢;