google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.
Apache License 2.0
10.07k stars 1.16k forks source link

Add support for windows arm64 #1037

Closed Nagico2 closed 4 weeks ago

Nagico2 commented 1 month ago

Why to modify

Recently, I attempted to install sentencepiece using pip on Windows on ARM, but due to python/setup.py not taking into account arm64 in the judgment of Windows arch, cmake mistakenly built the lib for the AMD64 architecture.

https://github.com/google/sentencepiece/blob/2de10cb30e982b980125d4713236dd2b29cc5f0c/python/setup.py#L125-L137

What has been modified

1. For building from source or pip install from source

I added a judgment for arm64 through platform.machine(), without modifying the judgment statements for x86 and x64, to ensure that it does not have any side effects on the original architecture building.

At the same time, I have added a section in the Python build documentation that explains how to build from source code on Windows for the convenience of Windows users.

2. For building wheels for win-arm64 automatically

I've also modified the .github/workflows/wheel.yml to build wheels for win-arm64 automatically. But there's a bug in cibuildwheel. https://github.com/pypa/cibuildwheel/issues/1942#issue-2421737266

Simply put, for the win arm64 arch in cibuildwheel, platform.machine() will retrieve AMD64 instead of ARM64. This will cause issues with the judgment of arch in setup.py, resulting in selecting the wrong lib path during build. So I add a special check for win-arm under ciwheelbuild using env PYTHON_ARCH.

Here's the Github Actions building results on my own repo. https://github.com/Nagico2/sentencepiece/actions/runs/10105917616

The outcomes

Now windows on arm users can build and install python wrapper from source. And after the modification of python/setup.py have been published to PyPI, windows on arm users can use pip install sentencepiece to build and install.

After the wheels for win-arm published to PyPI, indows on arm users can use pip install sentencepiece to install the pre-built wheels.

Others

I've also tried to add win-arm build in cmake.yml. But there's another bug in setup-python (https://github.com/actions/setup-python/issues/915). So I haven't made any modifications to this yet.


Modifying the build code may lead to serious consequences, please conduct comprehensive and full testing.

Thank you all for building the Windows on ARM ecosystem. :)

google-cla[bot] commented 1 month ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.