polm / fugashi

A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
MIT License
402 stars 33 forks source link

Add M1 / OSX arm64 wheels #55

Closed polm closed 1 year ago

polm commented 2 years ago

It's not clear how complicated this is.

It might be as simple as using cibuildwheel to cross-compile the wheel, which means it could be done right away.

However, it might be the case that only handles things the wheel builds directly, and won't take care of the build artifacts of MeCab itself. In that case it might require tweaking the MeCab build for cross compilation. Worst case it would require an OSX arm64 env to build MeCab directly.

I have tried the cibuildwheel solution and test wheels are available via pip install fugashi==1.1.2a6. If someone confirms they work I can do a release.

maxhgerlach commented 2 years ago

Hi @polm, I just came over via your comment at https://github.com/SamuraiT/mecab-python3/pull/74#issuecomment-1037229357. I don't have prior experience with this package, but I just gave it a go on my MacBook Pro (Apple M1 Pro CPU):

max@mm1p:~$ python -m venv fugashi-venv
max@mm1p:~$ source fugashi-venv/bin/activate
(fugashi-venv) max@mm1p:~$ pip install -U pip
# ...
Successfully installed pip-22.0.3
(fugashi-venv) max@mm1p:~$ pip install fugashi==1.1.2a6
Collecting fugashi==1.1.2a6
  Downloading fugashi-1.1.2a6-cp39-cp39-macosx_11_0_arm64.whl (46 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 46.3/46.3 KB 1.3 MB/s eta 0:00:00
Installing collected packages: fugashi
Successfully installed fugashi-1.1.2a6
(fugashi-venv) max@mm1p:~$ pip install unidic-lite
# ...
Successfully installed unidic-lite-1.0.8
(fugashi-venv) max@mm1p:~$ python
Python 3.9.10 (main, Feb 10 2022, 15:35:07)
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from fugashi import Tagger
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/max/fugashi-venv/lib/python3.9/site-packages/fugashi/__init__.py", line 1, in <module>
    from .fugashi import *
ImportError: dlopen(/Users/max/fugashi-venv/lib/python3.9/site-packages/fugashi/fugashi.cpython-39-darwin.so, 0x0002): symbol not found in flat namespace '_mecab_dict_index'

Sounds like Mecab is not linked in. I've also attached the output of nm -g showing a couple of undefined symbols: nm.txt

polm commented 2 years ago

Thanks for checking! It's unfortunate this didn't just work, it's going to be hard for me to troubleshoot without an M1.

I suspect the issue is that MeCab itself has to be compiled for the M1, and this error is due to the .so being for the wrong architecture. I think cross compilation is possible on the Github Actions hosts but I'll have to look into how it works. That might take a while.

Anyway, thanks again for checking!

maxhgerlach commented 2 years ago

No problem. I have very little experience in this area myself, but am happy to help a little if I can.

MasanoriYamada commented 2 years ago

I got the same error with fugashi==1.1.2a6 on my M1 mac. For those having trouble with the same error, I will report the version of fuggashi that worked.

fugashi==1.1.1 ipadic==1.0.0

env macOS 12.4 Macbook Pro (14 2021, Apple M1 Max)

polm commented 2 years ago

Thanks for the report. If you're using 1.1.1 then either you're building from source or using x86 wheels (and x86 Python), which means it should work the same with the latest 1.1.2.

Also note that the dictionaries are platform independent, so it doesn't matter which one you use.

AhnafS commented 2 years ago

1.1.1 didn't work for me, im running a Macbook Air with an M1 chip :(

polm commented 2 years ago

To clarify the current situation a bit, my impression is that it generally is harder to get fugashi to run on M1 Macs than other systems. The reason for this is that I cannot build M1 native wheels due to a lack of support in Github Actions, rather than any issue with the fugashi (or MeCab) code.

If you use an x86_64 Python on an M1 Mac, I believe that you should be able to use fugashi easily, like on other platforms. But there are various good reasons to not want to do that.

If you build fugashi from source with an M1-native Python, that should work if you also build MeCab from source in M1-native format.

If you are using M1-native Python and don't have MeCab installed, or have x86_64 MeCab installed, this won't work.

For most Cython/Python projects, it's possible to create M1 wheels by cross-compiling. That doesn't seem to work with fugashi because MeCab is a C++ dependency and doesn't work correctly with naive cross-compiling. I have difficulty troubleshooting this because I own no Apple hardware.

Github has just added M1 runners to their official roadmap (https://github.com/github/roadmap/issues/528), so I'll keep an eye on that and make changes to add support when it's available. If anyone has any suggestings that would make it possible to add support more quickly I'd be happy to consider them.

polm commented 2 years ago

It looks like earlier this month Github added support for M1 runners, but the setup-python action does not yet support it. I will keep an eye on this to see when it's possible to add support.

polm commented 2 years ago

To clarify: I missed it, but the support for M1 runners announced recently only covers self-hosted runners. As I own no Apple products this doesn't help fugashi/mecab-python3, so we'll have to wait for the roadmap still.

polm commented 1 year ago

As a quick update, I am still not clear how this can be done. I believe it is possible using self-hosted runners, but I am not sure how to set those up.

I would like to spend more time getting this working, but am unfamiliar with OSX development in general, so it'll take me a while. If anyone more familiar with it can explain how to automate M1 builds that'd be much appreciated. Paying for a cloud machine is not a problem either.

polm commented 1 year ago

This should finally be resolved by #80, addressed in v1.3.0. If you have any issues feel free to post them in this thread.