Closed mobilestack closed 1 year ago
Hm, thanks for the detailed environment information, but I'm not able to reproduce.
Can you set export TIKTOKEN_CACHE_DIR=""
and retry? This environment variable will prevent tiktoken from using a cache for the vocab files it downloads.
Note that even in the simple publicly available tests this code path is tested: https://github.com/openai/tiktoken/blob/3e8620030c68d2fd6d4ec6d38426e7a1983661f5/tests/test_simple_public.py#L9
I tried to set the key, but not solved. Is there a specific path for the cache? I might need to delete the cache manually.
The logic is here: https://github.com/openai/tiktoken/blob/3e8620030c68d2fd6d4ec6d38426e7a1983661f5/tiktoken/load.py#L33
So typically python -c 'import tempfile; import os; print(os.path.join(tempfile.gettempdir(), "data-gym-cache"))'
If that doesn't help, maybe you could set a breakpoint and see what the difference between those two dictionaries is.
Woo, that works, after deleted the cached files, it turns right now. Thanks a lot!
There might be an error of the file during or after downloading. Not sure if it is needed to check the cached file before use it, or in that assert bpe_ranks == encoder_json_loaded
line, might print more info if it failed.
The code is like this.
The error message is:
According to another issue that you suggest to run.
Since I don't have a python, but I have python3, so I run everything in venv.
Results are something like these.
Hopefully there is a solution. Many thanks!