openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
11.06k stars 749 forks source link

Unknown encoding gpt2 #301

Closed aryagxr closed 1 month ago

aryagxr commented 1 month ago

This is the code I'm trying to run tokenizer = tiktoken.get_encoding("gpt2") and this is the error I get:

{
    "name": "ValueError",
    "message": "Unknown encoding gpt2. Plugins found: ['tiktoken_ext.openai_public']",
    "stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[21], line 4
      1 import tiktoken
----> 4 tokenizer = tiktoken.get_encoding(\"gpt2\")

File ~/anaconda3/envs/ml-dl/lib/python3.12/site-packages/tiktoken/registry.py:68, in get_encoding(encoding_name)
     65     assert ENCODING_CONSTRUCTORS is not None
     67 if encoding_name not in ENCODING_CONSTRUCTORS:
---> 68     raise ValueError(
     69         f\"Unknown encoding {encoding_name}. Plugins found: {_available_plugin_modules()}\"
     70     )
     72 constructor = ENCODING_CONSTRUCTORS[encoding_name]
     73 enc = Encoding(**constructor())

ValueError: Unknown encoding gpt2. Plugins found: ['tiktoken_ext.openai_public']"
}

From issue #51, here is the full log:

Python 3.12.3
Linux-6.5.0-25-generic-x86_64-with-glibc2.35
Requirement already satisfied: wheel in ./env/lib/python3.12/site-packages (0.43.0)
Requirement already satisfied: tiktoken in ./env/lib/python3.12/site-packages (0.7.0)
Requirement already satisfied: regex>=2022.1.18 in ./env/lib/python3.12/site-packages (from tiktoken) (2024.5.15)
Requirement already satisfied: requests>=2.26.0 in ./env/lib/python3.12/site-packages (from tiktoken) (2.32.2)
Requirement already satisfied: charset-normalizer<4,>=2 in ./env/lib/python3.12/site-packages (from requests>=2.26.0->tiktoken) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in ./env/lib/python3.12/site-packages (from requests>=2.26.0->tiktoken) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./env/lib/python3.12/site-packages (from requests>=2.26.0->tiktoken) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in ./env/lib/python3.12/site-packages (from requests>=2.26.0->tiktoken) (2024.2.2)
<Encoding 'gpt2'>
['wheel-0.43.0.dist-info', 'pip-24.0.dist-info', 'certifi-2024.2.2.dist-info', 'urllib3', 'tiktoken_ext', 'idna', 'regex-2024.5.15.dist-info', 'urllib3-2.2.1.dist-info', 'wheel', 'tiktoken-0.7.0.dist-info', 'requests-2.32.2.dist-info', 'idna-3.7.dist-info', 'requests', 'regex', 'certifi', 'tiktoken', 'pip', 'charset_normalizer-3.3.2.dist-info', 'charset_normalizer']

Here is what I tried: (Solutions from #63 )

Identify the cache directory

cache_dir = os.path.join(tempfile.gettempdir(), "data-gym-cache") print(f"Cache directory: {cache_dir}")

Check if the cache directory exists and clear it if it does

if os.path.exists(cache_dir): print("Cache directory found. Attempting to clear it...") shutil.rmtree(cache_dir) print("Cache directory cleared.") else: print("Cache directory does not exist.")



- Tried re installing tiktoken

Any ideas on how I could fix this error?
Thanks in advance!
aryagxr commented 1 month ago

Closing this issue. Fixed the error after reinstalling tiktoken in a new conda environment.