lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
36.93k stars 4.55k forks source link

`TypeError: not a string` after pressing to special characters use delete #1809

Open FANGOD opened 1 year ago

FANGOD commented 1 year ago
今天(2023年6月29日)星期几为星期四。
USER: 而且4月6号是 星期四   #  --> 是{ }星, here is not space, after pressing to special characters use delete, throws an error

│ py:316 in _EncodeAsPieces                                                                        │
│                                                                                                  │
│    313 │   │   return _sentencepiece.SentencePieceProcessor__EncodeAsIds(self, text, enable_sam  │
│    314 │                                                                                         │
│    315 │   def _EncodeAsPieces(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos  │
│ ❱  316 │   │   return _sentencepiece.SentencePieceProcessor__EncodeAsPieces(self, text, enable_  │
│    317 │                                                                                         │
│    318 │   def _EncodeAsSerializedProto(self, text, enable_sampling, nbest_size, alpha, add_bos  │
│    319 │   │   return _sentencepiece.SentencePieceProcessor__EncodeAsSerializedProto(self, text  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: not a string

After pressing to special characters use delete, throws an error.

ValentinZhao commented 1 year ago

Same errors here.

merrymercy commented 1 year ago

@FANGOD Could you help us to fix it?

FANGOD commented 1 year ago

@FANGOD Could you help us to fix it?你能帮我们解决它吗?

Sorry, I'm very busy recently, I will solve it when I have time.

chensiyao12 commented 1 year ago

ASSISTANT: Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/FastChat/fastchat/serve/cli.py", line 280, in main(args) File "/home/FastChat/fastchat/serve/cli.py", line 206, in main chat_loop( File "/home/FastChat/fastchat/serve/inference.py", line 469, in chat_loop outputs = chatio.stream_output(output_stream) File "/home/FastChat/fastchat/serve/cli.py", line 59, in stream_output for outputs in output_stream: File "/usr/lib/python3/dist-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/home/FastChat/fastchat/serve/inference.py", line 85, in generate_stream input_ids = tokenizer(prompt).input_ids File "/usr/lib/python3/dist-packages/transformers/tokenization_utils_base.py", line 2577, in call encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs) File "/usr/lib/python3/dist-packages/transformers/tokenization_utils_base.py", line 2683, in _call_one return self.encode_plus( File "/usr/lib/python3/dist-packages/transformers/tokenization_utils_base.py", line 2756, in encode_plus return self._encode_plus( File "/usr/lib/python3/dist-packages/transformers/tokenization_utils.py", line 649, in _encode_plus first_ids = get_input_ids(text) File "/usr/lib/python3/dist-packages/transformers/tokenization_utils.py", line 616, in get_input_ids tokens = self.tokenize(text, kwargs) File "/usr/lib/python3/dist-packages/transformers/models/llama/tokenization_llama.py", line 174, in tokenize return super().tokenize(text, **kwargs) File "/usr/lib/python3/dist-packages/transformers/tokenization_utils.py", line 547, in tokenize tokenized_text.extend(self._tokenize(token)) File "/usr/lib/python3/dist-packages/transformers/models/llama/tokenization_llama.py", line 192, in _tokenize tokens = self.sp_model.encode(text, out_type=str) File "/usr/lib/python3/dist-packages/sentencepiece/init.py", line 531, in Encode return self._EncodeAsPieces(input, enable_sampling, nbest_size, File "/usr/lib/python3/dist-packages/sentencepiece/init.py", line 316, in _EncodeAsPieces return _sentencepiece.SentencePieceProcessor__EncodeAsPieces(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece) TypeError: not a string

when I use "lmsys/vicuna-13b-v1.5-16k" model, I have the same question and hope it can be resolved as soon as possible.

GaelicThunder commented 10 months ago

Any news on this? Keep having this error and the issue its pretty old...

boyugou commented 4 months ago

same issue here