huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
https://huggingface.co/docs/tokenizers
Apache License 2.0
8.99k stars 790 forks source link

Python 3.13 support #1639

Open iherasymenko opened 2 weeks ago

iherasymenko commented 2 weeks ago

The library cannot be built/installed with Python 3.13 RC.

Dockerfile:

FROM python:3.13-rc-bookworm
RUN curl https://sh.rustup.rs -sSf | sh -s -- -y
ENV PATH=/root/.cargo/bin:$PATH
RUN pip install tokenizers==0.20.0

Output:

28.56          Compiling tokenizers v0.20.0 (/tmp/pip-install-rtrxn7wj/tokenizers_502d52710ca54c4ea47f73913fe50a86/tokenizers)
28.56          Compiling numpy v0.21.0
28.56          Compiling tokenizers-python v0.20.0 (/tmp/pip-install-rtrxn7wj/tokenizers_502d52710ca54c4ea47f73913fe50a86/bindings/python)
28.56       error[E0425]: cannot find function, tuple struct or tuple variant `PyUnicode_FromKindAndData` in module `pyo3::ffi`
28.56          --> src/tokenizer.rs:326:46
28.56           |
28.56       326 |                     let unicode = pyo3::ffi::PyUnicode_FromKindAndData(
28.56           |                                              ^^^^^^^^^^^^^^^^^^^^^^^^^ help: a function with a similar name exists: `PyUnicode_FromOrdinal`
28.56           |
28.56          ::: /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-ffi-0.21.2/src/unicodeobject.rs:109:5
28.56           |
28.56       109 |     pub fn PyUnicode_FromOrdinal(ordinal: c_int) -> *mut PyObject;
28.56           |     ------------------------------------------------------------- similarly named function `PyUnicode_FromOrdinal` defined here
28.56       
28.56       error[E0425]: cannot find value `PyUnicode_4BYTE_KIND` in module `pyo3::ffi`
28.56          --> src/tokenizer.rs:327:36
28.56           |
28.56       327 |                         pyo3::ffi::PyUnicode_4BYTE_KIND as _,
28.56           |                                    ^^^^^^^^^^^^^^^^^^^^ not found in `pyo3::ffi`
28.56       
28.56       For more information about this error, try `rustc --explain E0425`.
28.56       error: could not compile `tokenizers-python` (lib) due to 2 previous errors
28.56       💥 maturin failed
28.56         Caused by: Failed to build a native library through cargo
28.56         Caused by: Cargo build finished with "exit status: 101": `env -u CARGO PYO3_ENVIRONMENT_SIGNATURE="cpython-3.13-64bit" PYO3_PYTHON="/usr/local/bin/python3.13" PYTHON_SYS_EXECUTABLE="/usr/local/bin/python3.13" "cargo" "rustc" "--features" "pyo3/extension-module" "--message-format" "json-render-diagnostics" "--manifest-path" "/tmp/pip-install-rtrxn7wj/tokenizers_502d52710ca54c4ea47f73913fe50a86/bindings/python/Cargo.toml" "--release" "--lib"`
28.56       Error: command ['maturin', 'pep517', 'build-wheel', '-i', '/usr/local/bin/python3.13', '--compatibility', 'off'] returned non-zero exit status 1
28.56       [end of output]
28.56   
28.56   note: This error originates from a subprocess, and is likely not a problem with pip.
28.56   ERROR: Failed building wheel for tokenizers
28.56 Failed to build tokenizers
28.67 ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (tokenizers)
ArthurZucker commented 2 weeks ago

Hey! we'll ship support as soon as maturin has it!

davidhewitt commented 1 week ago

Maturin has support but you need to bump the PyO3 version to 0.22.

sydney-runkle commented 1 week ago

Happy to help with this :).

ArthurZucker commented 1 week ago

Ah good catch I forgot about this. Bumping is gonna be bit annoying, I think pyo3 <-> numpy has an issue

davidhewitt commented 1 week ago

Ah, we're just about to release rust-numpy 0.22 so that might unblock here.

davidhewitt commented 6 days ago

rust-numpy 0.22 is now live 🚀

ArthurZucker commented 2 days ago

Cool! I'll work on that in the week!