openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
12.31k stars 833 forks source link

pyinstaller has some bug that results in improper packaging of tiktoken #43

Closed bofinbabu closed 1 year ago

bofinbabu commented 1 year ago

What could be the fix for this error. I am trying out the library for the first time.

import tiktoken
enc = tiktoken.get_encoding("gpt2")
assert enc.decode(enc.encode("hello world")) == "hello world"
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [47], in <cell line: 2>()
      1 import tiktoken
----> 2 enc = tiktoken.get_encoding("gpt2")
      3 assert enc.decode(enc.encode("hello world")) == "hello world"

File ~/work/p3ds/lib/python3.10/site-packages/tiktoken/registry.py:60, in get_encoding(encoding_name)
     57     assert ENCODING_CONSTRUCTORS is not None
     59 if encoding_name not in ENCODING_CONSTRUCTORS:
---> 60     raise ValueError(f"Unknown encoding {encoding_name}")
     62 constructor = ENCODING_CONSTRUCTORS[encoding_name]
     63 enc = Encoding(**constructor())

ValueError: Unknown encoding gpt2
hauntsaninja commented 1 year ago

How did you install tiktoken?

shirubei commented 1 year ago

Maybe similar case here. I compiled wechatGPT_Turbo.py to make it executable using pyinstaller under windows 10. When I run the executable directly , it showed error message listed below.

C:\Users\Administrator\Downloads>wechatGPT_Turbo.exe Traceback (most recent call last): File "wechatGPT_Turbo.py", line 13, in from revChatGPT_Turbo import Chatbot as Turbot File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "PyInstaller\loader\pyimod02_importers.py", line 352, in exec_module File "revChatGPT_Turbo.py", line 17, in ENCODER = tiktoken.get_encoding("gpt2") File "tiktoken\registry.py", line 60, in get_encoding ValueError: Unknown encoding gpt2

btw, tiktoken was installed via "pip3 install tiktoken" and imported to revChatGPT_Turbo.py as below: import tiktoken

But when I run "python wechatGPT_Turbo.py ", everything was OK. Any suggestion is appreciated. Thank you!

Jeremy-ttt commented 1 year ago

Same question as above.When I made it executable,the question came out.

hauntsaninja commented 1 year ago

I haven't ever used pyinstaller, sounds like there's a bug in it? The tiktoken distribution on PyPI contains two packages, tiktoken and tiktoken_ext and needs both of them for tiktoken.get_encoding("gpt2") to work.

Maybe see if pyinstaller people know what the issue is. I'm willing to make minor adjustments to how tiktoken specifies packaging metadata to support the use case.

shirubei commented 1 year ago

I haven't ever used pyinstaller, sounds like there's a bug in it? The tiktoken distribution on PyPI contains two packages, tiktoken and tiktoken_ext and needs both of them for tiktoken.get_encoding("gpt2") to work.

Maybe see if pyinstaller people know what the issue is. I'm willing to make minor adjustments to how tiktoken specifies packaging metadata to support the use case.

Thank you for response. Seems there's a bug in pyinstaller. I'll open an issues there.

shirubei commented 1 year ago

I'm willing to make minor adjustments to how tiktoken specifies packaging metadata to support the use case.

Have no idea of package metadata. But pyinstaller do have an option --copy-metadata PACKAGENAME . If minor changes are made, I'm glad to make a try and feed back. Thank you.

Jeremy-ttt commented 1 year ago

I have solved it using methods below:

1.Add --hidden-import=tiktoken_ext.openai_public --hidden-import=tiktoken_ext when you use pyinstaller to make it executable.

2.delete the code with open(os.path.join(_SCRIPT_DIR, "VERSION")) as _version_file: __version__ = _version_file.read().strip() in module "blobfile" __init__.py

Hope it works on you.

shirubei commented 1 year ago

I have solved it using methods below:

1.Add --hidden-import=tiktoken_ext.openai_public --hidden-import=tiktoken_ext when you use pyinstaller to make it executable.

2.delete the code with open(os.path.join(_SCRIPT_DIR, "VERSION")) as _version_file: __version__ = _version_file.read().strip() in module "blobfile" __init__.py

Many thanks! After thses stpes, ran into another error 'Could not find module 'C:\Users{MYNAME}\AppData\Local\Temp_MEI160522\tls_client\dependencies\tls-client-64.dll'.

I created a dir named dll and put tls-client-64.dll in it, then added option below and finally solved the problem. --add-binary "dll\tls-client-64.dll;tls_client/dependencies"

hauntsaninja commented 1 year ago

It looks like some of the issue here is the blobfile dependency. Most people won't need that; I can make that an optional dependency.

hauntsaninja commented 1 year ago

I've made blobfile an optional dependency in 0.3.1.

Based on Jeremy-ttt's message, it sounds like the rest of this can be handled by pyinstaller's --hidden-import.

Let me know if there's anything else I can do here, if not, I'll close this issue soon.

ManlyMoustache commented 1 year ago

I have solved it using methods below:

1.Add --hidden-import=tiktoken_ext.openai_public --hidden-import=tiktoken_ext when you use pyinstaller to make it executable.

2.delete the code with open(os.path.join(_SCRIPT_DIR, "VERSION")) as _version_file: __version__ = _version_file.read().strip() in module "blobfile" __init__.py

Hope it works on you.

This answer prevented me from going totally loco. Worked like a charm, I was trying --hidden-import method for tiktoken_ext but not for --hidden-import=tiktoken_ext.openai_public and this seems to fixed the issue completely!

Thanks a lot!

MysticDragonfly commented 1 year ago

I have solved it using methods below:我已经使用以下方法解决了它:

1.Add --hidden-import=tiktoken_ext.openai_public --hidden-import=tiktoken_ext when you use pyinstaller to make it executable.当您使用 pyinstaller 使其可执行时。

2.delete the code 2.删除代码 with open(os.path.join(_SCRIPT_DIR, "VERSION")) as _version_file: __version__ = _version_file.read().strip() in module "blobfile" __init__.py在模块“blobfile” __init__.py

Hope it works on you. 希望它对你有用。

thank you very much! I solve my problem!

MikkoHaavisto commented 1 year ago

To clarify, just adding the hidden imports mentioned in 1. fixed the bug for me and allowed the exefication of py.

hauntsaninja commented 1 year ago

If a comment fixed the issue for you, please show your appreciation via emoji reactions instead of commenting :-)

juanmillans commented 1 year ago

I ran into the same issue, Using Auto-py-to-exe did you find a way to solve it?

juanmillans commented 1 year ago

I just solved that shit!, look you have to use auto py installer and add manually every Fuking single library, ticktoken and ticktoken_ext as one of the guys up in the comments said. Im sohappy it worked !!!! by the way you have to use Auto-py-to-ext for it to work, or type the code yourself in pyinstaller which certainly will be a pain in the ass.

image
hauntsaninja commented 1 year ago

Closing, since there's nothing for tiktoken to do here. I added a mention to this in an FAQ issue: https://github.com/openai/tiktoken/issues/98

octimot commented 1 year ago

This still seems to be an issue on my end, when trying to include this package via .spec file instead of --hidden-import CLI argument

@shirubei Did you open this issue on pyinstaller/issues ? I can't seem to find it.

@hauntsaninja Technically, importing tiktoken_ext should also include tiktoken_ext.openai_public so there's still something missing here... Maybe pyinstaller needs the init.py file for tiktoken_ext so that it knows that it's a package - as per Python manual?

Cheers!

hauntsaninja commented 1 year ago

Doesn't need the __init__.py, tiktoken_ext is a namespace package. We use this to allow extensibility, e.g. see https://github.com/openai/tiktoken#extending-tiktoken

octimot commented 1 year ago

Got it, thanks!

I see there's a way to hook namespace packages according to Pyinstaller.

I'll dig more and try to figure out what's going on, Maybe it's something weird on on my machine...

octimot commented 1 year ago

Never mind! I found the issue in my .spec file.

To anyone else that is as clumsy as I am when not using the command line arguments with pyinstaller, the proper way to include hidden imports via .spec file is to append them directly to the hiddenimports list.

In other words, just add this (preferably after hiddenimports = []):

# add tiktoken_ext to hidden imports

hiddenimports.append('tiktoken_ext')

hiddenimports.append('tiktoken_ext.openai_public')
Lucienxhh commented 1 year ago

A simplier solution. Just add these lines to your code which imports tiktoken.

from tiktoken_ext import openai_public
import tiktoken_ext