Specify a github or local repo, github pull request, arXiv or Sci-Hub paper, Youtube transcript or documentation URL on the web and scrape into a text file and clipboard for easier LLM ingestion
Disallowing or allowing special characters does not appear to work.
called `Result::unwrap()` on an `Err` value: RuntimeError(StackOverflow)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
File "/Users/richard.sanders/Documents/ulterior/1filellm/onefilellm.py", line 605, in <module>
main()
File "/Users/richard.sanders/Documents/ulterior/1filellm/onefilellm.py", line 592, in main
compressed_token_count = get_token_count(compressed_text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/richard.sanders/Documents/ulterior/1filellm/onefilellm.py", line 236, in get_token_count
tokens = enc.encode(text, disallowed_special=disallowed_special)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/richard.sanders/Documents/ulterior/1filellm/.venv/lib/python3.11/site-packages/tiktoken/core.py", line 124, in encode
return self._core_bpe.encode(text, allowed_special)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: RuntimeError(StackOverflow)```
Suggest adding chunking during the `get_token_count` process.
Disallowing or allowing special characters does not appear to work.