openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
12.48k stars 856 forks source link

When I was inputting long text into a large model, that is, when the len of the text was 1024*1024, a StackOverflow error occurred. #327

Open YangQiangli opened 3 months ago

YangQiangli commented 3 months ago

When I was inputting long text into a large model, that is, when the len of the text was 1024*1024, a StackOverflow error occurred.

thread '<unnamed>' panicked at src/lib.rs:227:33:
called `Result::unwrap()` on an `Err` value: RuntimeError(StackOverflow)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "/home/yang/teststack_1m.py", line 37, in <module>
    encoded_tokens = your_object.encode_request(request_id=request_id, prompt=prompt)
  File "/home/yang/teststack_1m.py", line 19, in encode_request
    prompt_token_ids = self.tokenizer.encode(prompt, add_special_tokens=True)
  File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2600, in encode
    encoded_inputs = self.encode_plus(
  File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3008, in encode_plus
    return self._encode_plus(
  File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 719, in _encode_plus
    first_ids = get_input_ids(text)
  File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 686, in get_input_ids
    tokens = self.tokenize(text, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 617, in tokenize
    tokenized_text.extend(self._tokenize(token))
  File "/root/.cache/huggingface/modules/transformers_modules/tokenization_chatglm.py", line 88, in _tokenize
    ids = self.tokenizer.encode(text)
  File "/usr/local/lib/python3.10/site-packages/tiktoken/core.py", line 124, in encode
    return self._core_bpe.encode(text, allowed_special)
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: RuntimeError(StackOverflow)