openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
11.61k stars 785 forks source link

from tiktoken import _tiktoken is causing circular dependency on AWS lambda #184

Closed akshat-g closed 6 months ago

akshat-g commented 11 months ago

Error: cannot import name '_tiktoken' from partially initialized module 'tiktoken' (most likely due to a circular import)

I have added all python packages as a AWS layer and my lambda functions are accessing those dependencies via AWS layer. There is no file named tiktoken.py in my project.

_tiktoken is defined in a rust file. Is there anything I am missing while packaging tiktoken for AWS lambda? I need help here. Have been stuck in this for a while now.

hauntsaninja commented 11 months ago

I don't know much about lambda or what an AWS layer is. How are you building / installing tiktoken?

alvarobartt commented 11 months ago

Can you share a file with the Python dependencies you have installed in that environment? i.e. pip list >> dependencies.txt

Asking because in the past some dependencies were running into conflicts with tiktoken, so this may be the case here too.

btakeya commented 9 months ago

same issue on aws glue using tiktoken-0.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl and vanilla build (tag 0.5.1 -- 39f29cecdb6fc38d9a3434e5dd15e4de58cf3c80) on my laptop.

dbmanifest commented 9 months ago

any fix for this?

hauntsaninja commented 8 months ago

There's little I can do without a clear repro. If you're having this issue, I recommend:

mrdomino commented 7 months ago

I may have a clean repro. I'm trying to make a Portfile for MacPorts. PR is macports/macports-ports#22094.

If you clone that and apply the following patch:

--- Portfile.old    2024-01-08 21:29:33
+++ Portfile    2024-01-08 21:31:50
@@ -36,6 +36,9 @@
                     port:py${python.version}-regex \
                     port:py${python.version}-requests

+    depends_test-append \
+                    port:py${python.version}-hypothesis
+
     # cd ${worksrcpath}
     # sudo cargo update
     # egrep -e '^(name|version|checksum) = ' Cargo.lock | perl -pe 's/^(?:name|version|checksum) = "(.+)"/$1/' | tr '\n' ' ' | perl -pe 's|([0-9a-f]{64})|\1 \\\n|g' | pbcopy
@@ -86,5 +89,7 @@
                     windows_x86_64_gnullvm 0.48.5 0b7b52767868a23d5bab768e390dc5f5c55825b6d30b86c844ff2dc7414044cc \
                     windows_x86_64_msvc 0.48.5 ed94fce61571a4006852b7389a063ab983c02eb1bb37b47f8272ce92d06d9538

+    test.run        yes
+
     livecheck.type  none
 }

And then run sudo port test py312-tiktoken, the error will show up.

mrdomino commented 7 months ago

In case it's helpful, here's the relevant excerpt of the log of running that command:

test.log

Stack trace:

:debug:test system:  cd "/opt/local/var/macports/build/_Users_joshin_macports-ports_python_py-tiktoken/py312-tiktoken/work/tiktoken-0.5.2" && py.test-3.12 -o addopts='' 
:info:test ================================== test session starts ===================================
:info:test platform darwin -- Python 3.12.1, pytest-7.4.3, pluggy-1.3.0
:info:test rootdir: /opt/local/var/macports/build/_Users_joshin_macports-ports_python_py-tiktoken/py312-tiktoken/work/tiktoken-0.5.2
:info:test plugins: hypothesis-6.92.1
:info:test collected 0 items / 5 errors
:info:test ========================================= ERRORS =========================================
:info:test ________________________ ERROR collecting tests/test_encoding.py _________________________
:info:test ImportError while importing test module '/opt/local/var/macports/build/_Users_joshin_macports-ports_python_py-tiktoken/py312-tiktoken/work/tiktoken-0.5.2/tests/test_encoding.py'.
:info:test Hint: make sure your test modules/packages have valid Python names.
:info:test Traceback:
:info:test /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
:info:test     return _bootstrap._gcd_import(name[level:], package, level)
:info:test tests/test_encoding.py:9: in <module>
:info:test     import tiktoken
:info:test tiktoken/__init__.py:2: in <module>
:info:test     from .core import Encoding as Encoding
:info:test tiktoken/core.py:9: in <module>
:info:test     from tiktoken import _tiktoken
:info:test E   ImportError: cannot import name '_tiktoken' from partially initialized module 'tiktoken' (most likely due to a circular import) (/opt/local/var/macports/build/_Users_joshin_macports-ports_python_py-tiktoken/py312-tiktoken/work/tiktoken-0.5.2/tiktoken/__init__.py)
hauntsaninja commented 7 months ago

If you're repackaging this it's pretty clear how it would show up, e.g. when testing you may need to specify import mode like so: https://github.com/openai/tiktoken/blob/main/pyproject.toml#L40 This makes sense given how pytest works: https://docs.pytest.org/en/7.1.x/explanation/pythonpath.html

But I don't use AWS Lambda, so I'm not sure how those folks are encountering this. Presumably they're not trying to pytest tiktoken.

mrdomino commented 7 months ago

You're right - I had managed to set up PYTHONPATH and to set --import-mode=append but never at the same time. Thanks.

mrdomino commented 7 months ago

Hmm, sorry to trouble you further but I'm now getting a similar error on a subprocess.check_call:

:debug:test system:  cd "/opt/local/var/macports/build/_Users_joshin_macports-ports_python_py-tiktoken/py312-tiktoken/work/tiktoken-0.5.2" && py.test-3.12 -o addopts='' --import-mode=append 
:info:test ================================== test session starts ===================================
:info:test platform darwin -- Python 3.12.1, pytest-7.4.3, pluggy-1.3.0
:info:test rootdir: /opt/local/var/macports/build/_Users_joshin_macports-ports_python_py-tiktoken/py312-tiktoken/work/tiktoken-0.5.2
:info:test plugins: hypothesis-6.92.1
:info:test collected 27 items
:info:test tests/test_encoding.py ....................                                        [ 74%]
:info:test tests/test_misc.py .F                                                              [ 81%]
:info:test tests/test_offsets.py ..                                                           [ 88%]
:info:test tests/test_simple_public.py ..F                                                    [100%]
:info:test ======================================== FAILURES ========================================
:info:test ___________________________ test_optional_blobfile_dependency ____________________________
:info:test     def test_optional_blobfile_dependency():
:info:test         prog = """
:info:test     import tiktoken
:info:test     import sys
:info:test     assert "blobfile" not in sys.modules
:info:test     """
:info:test >       subprocess.check_call([sys.executable, "-c", prog])
:info:test tests/test_misc.py:24: 
:info:test _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
:info:test popenargs = (['/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12', '-c', '\nimport tiktoken\nimport sys\nassert "blobfile" not in sys.modules\n'],)
:info:test kwargs = {}, retcode = 1
:info:test cmd = ['/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12', '-c', '\nimport tiktoken\nimport sys\nassert "blobfile" not in sys.modules\n']
:info:test     def check_call(*popenargs, **kwargs):
:info:test         """Run command with arguments.  Wait for command to complete.  If
:info:test         the exit code was zero then return, otherwise raise
:info:test         CalledProcessError.  The CalledProcessError object will have the
:info:test         return code in the returncode attribute.
:info:test     
:info:test         The arguments are the same as for the call function.  Example:
:info:test     
:info:test         check_call(["ls", "-l"])
:info:test         """
:info:test         retcode = call(*popenargs, **kwargs)
:info:test         if retcode:
:info:test             cmd = kwargs.get("args")
:info:test             if cmd is None:
:info:test                 cmd = popenargs[0]
:info:test >           raise CalledProcessError(retcode, cmd)
:info:test E           subprocess.CalledProcessError: Command '['/opt/local/Library/Frameworks/Python.framework/Versions/3.12/bin/python3.12', '-c', '\nimport tiktoken\nimport sys\nassert "blobfile" not in sys.modules\n']' returned non-zero exit status 1.
:info:test /opt/local/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py:413: CalledProcessError
:info:test ---------------------------------- Captured stderr call ----------------------------------
:info:test Traceback (most recent call last):
:info:test   File "<string>", line 2, in <module>
:info:test   File "/opt/local/var/macports/build/_Users_joshin_macports-ports_python_py-tiktoken/py312-tiktoken/work/tiktoken-0.5.2/tiktoken/__init__.py", line 2, in <module>
:info:test     from .core import Encoding as Encoding
:info:test   File "/opt/local/var/macports/build/_Users_joshin_macports-ports_python_py-tiktoken/py312-tiktoken/work/tiktoken-0.5.2/tiktoken/core.py", line 9, in <module>
:info:test     from tiktoken import _tiktoken
:info:test ImportError: cannot import name '_tiktoken' from partially initialized module 'tiktoken' (most likely due to a circular import) (/opt/local/var/macports/build/_Users_joshin_macports-ports_python_py-tiktoken/py312-tiktoken/work/tiktoken-0.5.2/tiktoken/__init__.py)
hauntsaninja commented 6 months ago

Closing, since it's been several months without a clear repro. If you do have a reproducer, please post the information I requested in https://github.com/openai/tiktoken/issues/184#issuecomment-1837412113