openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
11.76k stars 801 forks source link

"pip install tiktoken" fails when I execute "docker build" for PyPy and amd64 on an M1 Mac. #131

Closed aikawa-7 closed 1 year ago

aikawa-7 commented 1 year ago

The command "pip install tiktoken" fails when I execute "docker build" for PyPy and amd64 on an M1 Mac. We cannot deploy.

I suspect the issue may be related to cross-compilation, but I'm unable to identify the exact cause.

Environment:

Build command:

nerdctl -n k8s.io build --platform linux/amd64 -f dockerfile .
# nerdctl -n k8s.io is a substitute for docker build.

dockerfile:

# https://github.com/openai/tiktoken/issues/23#issuecomment-1437463984
# copied from 'wheels for ARM64 Linux #23'
FROM pypy:3.9 as builder
RUN apt-get update && apt-get install -y gcc curl
RUN curl https://sh.rustup.rs -sSf | sh -s -- -y && apt-get install --reinstall libc6-dev -y
ENV PATH="/root/.cargo/bin:${PATH}"
RUN pip install --upgrade pip && pip install tiktoken

Try it without "docker build":

nerdctl -n k8s.io run --platform linux/amd64 -it --name pypy3_container pypy:3 bash
apt-get update && apt-get install -y gcc curl
curl https://sh.rustup.rs -sSf | sh -s -- -y && apt-get install --reinstall libc6-dev -y
PATH="/root/.cargo/bin:${PATH}"
pip install --upgrade pip && pip install tiktoken

Error log:

  158.9       copying tiktoken/registry.py -> build/lib.linux-x86_64-pypy39/tiktoken
  #0 158.9       copying tiktoken/core.py -> build/lib.linux-x86_64-pypy39/tiktoken
  #0 158.9       copying tiktoken/model.py -> build/lib.linux-x86_64-pypy39/tiktoken
  #0 158.9       creating build/lib.linux-x86_64-pypy39/tiktoken_ext
  #0 158.9       copying tiktoken_ext/openai_public.py -> build/lib.linux-x86_64-pypy39/tiktoken_ext
  #0 158.9       running egg_info
  #0 158.9       writing tiktoken.egg-info/PKG-INFO
  #0 158.9       writing dependency_links to tiktoken.egg-info/dependency_links.txt
  #0 158.9       writing requirements to tiktoken.egg-info/requires.txt
  #0 158.9       writing top-level names to tiktoken.egg-info/top_level.txt
  #0 158.9       reading manifest file 'tiktoken.egg-info/SOURCES.txt'
  #0 158.9       reading manifest template 'MANIFEST.in'
  #0 158.9       warning: no files found matching 'Makefile'
  #0 158.9       adding license file 'LICENSE'
  #0 158.9       writing manifest file 'tiktoken.egg-info/SOURCES.txt'
  #0 158.9       copying tiktoken/py.typed -> build/lib.linux-x86_64-pypy39/tiktoken
  #0 158.9       running build_ext
  #0 158.9       running build_rust
  #0 158.9       <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
  #0 158.9       <jemalloc>: (This is the expected behaviour if you are running under QEMU)
  #0 158.9       <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
  #0 158.9       <jemalloc>: (This is the expected behaviour if you are running under QEMU)
  #0 158.9           Updating crates.io index
  #0 158.9       cargo rustc --lib --message-format=json-render-diagnostics --manifest-path Cargo.toml --release -v --features pyo3/extension-module --crate-type cdylib --
  #0 158.9       error: `cargo rustc --lib --message-format=json-render-diagnostics --manifest-path Cargo.toml --release -v --features pyo3/extension-module --crate-type cdylib --` failed with code -9
  #0 158.9       [end of output]
  #0 158.9
  #0 158.9   note: This error originates from a subprocess, and is likely not a problem with pip.
  #0 158.9   ERROR: Failed building wheel for tiktoken
  #0 159.0   Building wheel for regex (pyproject.toml): started
  #0 199.5   Building wheel for regex (pyproject.toml): finished with status 'done'
  #0 199.5   Created wheel for regex: filename=regex-2023.5.5-pp39-pypy39_pp73-linux_x86_64.whl size=307847 sha256=d83dcfd2f298f85a445de8f3d10dd92bdf276be4ba49caa8d015181931dbd3a6
  #0 199.5   Stored in directory: /root/.cache/pip/wheels/cb/b2/be/aa7f39218e250941dba541de7dbb51f057c4b55a4021f95311
  #0 199.5 Successfully built regex
  #0 199.5 Failed to build tiktoken
  #0 199.5 ERROR: Could not build wheels for tiktoken, which is required to install pyproject.toml-based projects
hauntsaninja commented 1 year ago

If you're trying cross compilation, maybe try setting CARGO_NET_GIT_FETCH_WITH_CLI=true. This fixes some weird issue that causes OOMs and you are seeing a -9 exit code.

aikawa-7 commented 1 year ago

Thank you very much for your help. The build was successful.