taishi-i / nagisa

A Japanese tokenizer based on recurrent neural networks
https://huggingface.co/spaces/taishi-i/nagisa-demo
MIT License
379 stars 22 forks source link

building nagisa on m1 #30

Closed dataf3l closed 2 months ago

dataf3l commented 1 year ago

I am facing this issue:

[notice] To update, run: pip install --upgrade pip
(venv) b@m1 vocab % pip install nagisa
Collecting nagisa
  Using cached nagisa-0.2.8.tar.gz (20.9 MB)
  Preparing metadata (setup.py) ... done
Collecting six
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting numpy
  Using cached numpy-1.23.4-cp310-cp310-macosx_11_0_arm64.whl (13.3 MB)
Collecting nagisa
  Using cached nagisa-0.2.7.tar.gz (20.9 MB)
  Preparing metadata (setup.py) ... done
Collecting DyNet
  Using cached dyNET-2.1.2.tar.gz (509 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting cython
  Using cached Cython-0.29.32-py2.py3-none-any.whl (986 kB)
Building wheels for collected packages: nagisa, DyNet
  Building wheel for nagisa (setup.py) ... done
  Created wheel for nagisa: filename=nagisa-0.2.7-cp310-cp310-macosx_11_0_arm64.whl size=21306402 sha256=c559ab30293dffc0d1ae36d215725dec08da0910ed1c3331728c398397258d2f
  Stored in directory: /Users/b/Library/Caches/pip/wheels/cf/38/0b/463d99fdf6d3c736cfcb4124124496513831eeefdc7f896391
  Building wheel for DyNet (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for DyNet (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [101 lines of output]
      /private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-build-env-rvxcggqa/overlay/lib/python3.10/site-packages/setuptools/dist.py:530: UserWarning: Normalizing 'v2.1.2' to '2.1.2'
        warnings.warn(tmpl.format(**locals()))
      /private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-build-env-rvxcggqa/overlay/lib/python3.10/site-packages/setuptools/dist.py:771: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
        warnings.warn(
      running bdist_wheel
      running build
      INFO:root:CMAKE_PATH='/opt/homebrew/bin/cmake'
      INFO:root:MAKE_PATH='/usr/bin/make'
      INFO:root:MAKE_FLAGS='-j 8'
      INFO:root:EIGEN3_INCLUDE_DIR='/private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a/build/py3.10-64bit/eigen'
      INFO:root:EIGEN3_DOWNLOAD_URL='https://github.com/clab/dynet/releases/download/2.1/eigen-b2e267dc99d4.zip'
      INFO:root:CC_PATH='/usr/bin/gcc'
      INFO:root:CXX_PATH='/usr/bin/g++'
      INFO:root:SCRIPT_DIR='/private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a'
      INFO:root:BUILD_DIR='/private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a/build/py3.10-64bit'
      INFO:root:INSTALL_PREFIX='/Users/b/study/jap/vocab/venv/lib/python3.10/site-packages/../../..'
      INFO:root:PYTHON='/Users/b/study/jap/vocab/venv/bin/python3.10'
      cmake version 3.24.1

      CMake suite maintained and supported by Kitware (kitware.com/cmake).
      Apple clang version 13.1.6 (clang-1316.0.21.2.5)
      Target: arm64-apple-darwin21.6.0
      Thread model: posix
      InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
      INFO:root:Creating build directory /private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a/build/py3.10-64bit
      INFO:root:Fetching Eigen...
      INFO:root:Unpacking Eigen...
      INFO:root:Configuring...
      -- The C compiler identification is AppleClang 13.1.6.13160021
      -- The CXX compiler identification is AppleClang 13.1.6.13160021
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/usr/bin/gcc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/usr/bin/g++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      CMake Deprecation Warning at CMakeLists.txt:2 (cmake_minimum_required):
        Compatibility with CMake < 2.8.12 will be removed from a future version of
        CMake.

        Update the VERSION argument <min> value or use a ...<max> suffix to tell
        CMake that the project does not need compatibility with older versions.

      -- Optimization level: fast
      -- BACKEND not specified, defaulting to eigen.
      -- Eigen dir is /private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a/build/py3.10-64bit/eigen
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Found Cython version 0.29.32

      CMAKE_INSTALL_PREFIX="/Users/b/study/jap/vocab/venv"
      PROJECT_SOURCE_DIR="/private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a"
      PROJECT_BINARY_DIR="/private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a/build/py3.10-64bit"
      LIBS=""
      EIGEN3_INCLUDE_DIR="/private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a/build/py3.10-64bit/eigen"
      MKL_LINK_DIRS=""
      WITH_CUDA_BACKEND=""
      CUDA_RT_FILES=""
      CUDA_RT_DIRS=""
      CUDA_CUBLAS_FILES=""
      CUDA_CUBLAS_DIRS=""
      MSVC=""
      fatal: not a git repository (or any of the parent directories): .git
      -- Configuring done
      -- Generating done
      -- Build files have been written to: /private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a/build/py3.10-64bit
      INFO:root:Compiling...
      [  4%] Building CXX object dynet/CMakeFiles/dynet.dir/deep-lstm.cc.o
      [  4%] Building CXX object dynet/CMakeFiles/dynet.dir/exec.cc.o
      [  4%] Building CXX object dynet/CMakeFiles/dynet.dir/aligned-mem-pool.cc.o
      [  5%] Building CXX object dynet/CMakeFiles/dynet.dir/cfsm-builder.cc.o
      [  8%] Building CXX object dynet/CMakeFiles/dynet.dir/dynet.cc.o
      [  8%] Building CXX object dynet/CMakeFiles/dynet.dir/dict.cc.o
      [ 10%] Building CXX object dynet/CMakeFiles/dynet.dir/devices.cc.o
      [ 11%] Building CXX object dynet/CMakeFiles/dynet.dir/dim.cc.o
      clang: error: the clang compiler does not support '-march=native'
      clang: error: the clang compiler does not support '-march=native'
      clang: error: the clang compiler does not support '-march=native'
      clang: error: the clang compiler does not support '-march=native'
      clang: error: the clang compiler does not support '-march=native'
      clang: error: the clang compiler does not support '-march=native'
      make[2]: *** [dynet/CMakeFiles/dynet.dir/devices.cc.o] Error 1
      make[2]: *** Waiting for unfinished jobs....
      make[2]: *** [dynet/CMakeFiles/dynet.dir/aligned-mem-pool.cc.o] Error 1
      make[2]: *** [dynet/CMakeFiles/dynet.dir/dynet.cc.o] Error 1
      make[2]: *** [dynet/CMakeFiles/dynet.dir/cfsm-builder.cc.o] Error 1
      clang: error: the clang compiler does not support '-march=native'
      clang: error: the clang compiler does not support '-march=native'
      make[2]: *** [dynet/CMakeFiles/dynet.dir/dim.cc.o] Error 1
      make[2]: *** [dynet/CMakeFiles/dynet.dir/deep-lstm.cc.o] Error 1
      make[2]: *** [dynet/CMakeFiles/dynet.dir/dict.cc.o] Error 1
      make[2]: *** [dynet/CMakeFiles/dynet.dir/exec.cc.o] Error 1
      make[1]: *** [dynet/CMakeFiles/dynet.dir/all] Error 2
      make: *** [all] Error 2
      error: /usr/bin/make -j 8
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for DyNet
Successfully built nagisa

any ideas?

denvazh commented 1 year ago

I was able to make it working for myself, however this required building both nagisa and related DyNet dependencies from sources and directly from git repositories. This was fine for my problem, because I was experimenting and only cared about making it working. It might be a bit more challenging if it has to be installed automatically as part of some bigger project. Hopefully one day both Nagisa and DyNet would publish wheels for both OS X on M1 and linux/arm64 🙄

First was DyNet since it was the one causing the install error on M1. It seems M1 support was not released yet ( https://github.com/clab/dynet/pull/1648 ) and it had to be built from sources regardless.

Normally, it should be possible to install from git repository directly (pip install git+https://github.com/clab/dynet#egg=dynet) however this didn't work:

  Copying dyNET.egg-info to build/bdist.macosx-12.4-arm64/wheel/dyNET-0.0.0-py3.10.egg-info
  running install_scripts
  error: [Errno 2] No such file or directory: 'LICENSE.txt'
  ----------------------------------------
  ERROR: Failed building wheel for dynet
Failed to build dynet
ERROR: Could not build wheels for dynet which use PEP 517 and cannot be installed directly

Instead I had to do clone dynet project, disable license copy, build wheel and install it locally

git clone git@github.com:clab/dynet.git
cd dynet
echo "license_files =" >> setup.cfg
brew install eigen
pip install wheel
python setup.py bdist_wheel
pip install build/py3.10-64bit/python/dist/dyNET-0.0.0-cp310-cp310-macosx_12_0_arm64.whl

Then I could install nagisa with pip install nagisa however it didn't work when I actually used it:

Python 3.10.2 (main, Jun 13 2022, 19:02:38) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nagisa
[dynet] random seed: 1234
[dynet] allocating memory: 32MB
[dynet] memory allocation done.
>>> text = 'ペニーは鮮やかな青い魚を買った。'
>>> doc = nagisa.tagging(text)
>>> doc.words
[1]    42608 segmentation fault  python

I used the same method and installed nagisa from local repository:

git clone git@github.com:taishi-i/nagisa.git
cd nagisa

I had to patch setup.py and force DyNet because it was expecting DyNet38 project fork:

diff --git a/setup.py b/setup.py
index 83f8da6..9cc1693 100644
--- a/setup.py
+++ b/setup.py
@@ -73,6 +73,8 @@ def extensions():
 def switch_install_requires():
     major = sys.version_info.major
     minor = sys.version_info.minor
+    return ['six', 'numpy', 'DyNet']
+
     if os.name == 'posix' and major == 3 and minor > 7:
         return ['six', 'numpy', 'DyNet38']
     else:

With that I was able to build the project:

python setup.py bdist_wheel
pip install dist/nagisa-0.2.8-cp310-cp310-macosx_12_0_arm64.whl

and could see that its finally working:

Python 3.10.2 (main, Jun 13 2022, 19:02:38) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nagisa
[dynet] random seed: 1234
[dynet] allocating memory: 32MB
[dynet] memory allocation done.
>>> text = 'ペニーは鮮やかな青い魚を買った。'
>>> doc = nagisa.tagging(text)
>>> doc.words
['ペニー', 'は', '鮮やか', 'な', '青い', '魚', 'を', '買っ', 'た', '。']
>>>
jjy0328 commented 11 months ago

thank you

ai-nikolai commented 4 months ago

@denvazh thank you for the write-up

@taishi-i any updates on if this could become available on Mac ARM based?

dataf3l commented 4 months ago

I for one vote for all of us combined to pitch in 10 bucks so the author(s?) can have nice a new shiny M1.

that aside,

I don't remember this task, or what I was trying to accomplish, or what was even the project about, I guess I was just testing things. guys, today we have llama and gpt, maybe let's use that if the usecase is not industrial?

so here is the question, can gpt do the same thing as nagisa?

and if so, do we need nagisa?

those are my humble questions, I am not by any means trying to diminish the value of the contribution of the authors, just pointing out that perhaps an alternative exists to whomever has this problem, the alternative being chatgpt

in this context, maybe we don't really need to fix this? or maybe it's low priority?

having said that, MAYBE WE CAN USE CHATGPT ITSELF to fix whatever issue was present back in nov 2022, a few days before chatgpt came out.

taishi-i commented 4 months ago

Thank you everyone for your comments. I apologize for the inconvenience.

This error is not caused by nagisa itself, but by the dependent library dynet, which does not provide a wheel for M1 Mac. Therefore, I tried to build a dynet wheel on my own, but I didn't have the M1 Mac environment at hand, and even using GitHub action, I couldn't build it successfully.

It is difficult to solve this problem immediately, so I recommend using alternative methods such as Janome, Fugashi or Sudachi for M1 Mac.

Finally, thank you for considering nagisa. I'm sorry I couldn't help you.

dataf3l commented 4 months ago

hay man don't worry, you didn't inconvenience anybody, quite the contrary, you helped inmensely by making awesome open source software.

I'll file a issue on dynet so they can get the thing done, and then we just wait for a solution.

just because you posted some code online, this doesn't mean you were in a obligation to help anybody!

so don't worry too much, things will eventually work out.

taishi-i commented 4 months ago

Hi @dataf3l. I'm sorry for not being able to reply until now. Thank you for your comment. I am very grateful to receive such a comment.

To ensure that other Mac M1 users do not face difficulties, please keep this issue open. I cannot promise an immediate solution, but I would like to resume efforts to solve this problem. Until then, I recommend M1 Mac users use the alternative methods mentioned above or consider using Ubuntu. Thank you.

takuto0515 commented 4 months ago

@taishi-i Hi, I am facing the same DyNet setup problem with M2 Pro Max.

I think this project is much simpler than other existing Japanese division tools and seems very easy to use. I am looking forward to seeing the problem resolved!

taishi-i commented 4 months ago

Hi @takuto0515! Thank you for your message. I am aware that this library is not available on Mac OS and it is an issue that I intend to resolve urgently.

To address this problem, I am currently experimenting with creating dynet wheels using the latest Python on Windows (Not Ubuntu OS). Concurrently, I plan to set up a Mac OS environment and aim to create a wheel there as well. However, I cannot promise immediate availability, and if you need to use it on Mac OS, I would suggest the above alternatives or running it on an Ubuntu environment with an Intel CPU. I apologize for any inconvenience. Thank you.

BarnabasSzabolcs commented 3 months ago

Thank you everyone for your comments. I apologize for the inconvenience.

This error is not caused by nagisa itself, but by the dependent library dynet, which does not provide a wheel for M1 Mac. Therefore, I tried to build a dynet wheel on my own, but I didn't have the M1 Mac environment at hand, and even using GitHub action, I couldn't build it successfully.

It is difficult to solve this problem immediately, so I recommend using alternative methods such as Janome, Fugashi or Sudachi for M1 Mac.

Finally, thank you for considering nagisa. I'm sorry I couldn't help you.

I can confirm that fugashi works fine (I recommend using it with ipadic).

import fugashi
import ipadic
tagger = fugashi.GenericTagger(ipadic.MECAB_ARGS + ' -Owakati')
tagger.parse(text)
taishi-i commented 2 months ago

Hi @ai-nikolai, @takuto0515 and everyone who participated in this chat! Thank you for your patience.

Nagisa is now available on MacOS M1/2. It is compatible with Python versions 3.9 to 3.12. Also, without your installation method, this problem could not have been solved. It was really helpful. Thank you, @denvazh!

Please install nagisa using the following command.

pip install nagisa

Here is the basic usage.

import nagisa

text = 'Pythonで簡単に使えるツールです'
words = nagisa.tagging(text)
print(words)
#=> Python/名詞 で/助詞 簡単/形状詞 に/助動詞 使える/動詞 ツール/名詞 です/助動詞

# Get a list of words
print(words.words)
#=> ['Python', 'で', '簡単', 'に', '使える', 'ツール', 'です']

# Get a list of POS-tags
print(words.postags)
#=> ['名詞', '助詞', '形状詞', '助動詞', '動詞', '名詞', '助動詞']

If you encounter any installation errors, please comment again. I apologize for any inconvenience caused. Thank you for considering the use of nagisa. I hope this tool will be useful to you.

taishi-i commented 2 months ago

I have confirmed that it works on macOS M1/M2 using Github Actions. Therefore, as this issue has been resolved, I will close this issue. If you are unable to install, please reopen the issue and add a comment. Thank you, everyone!

dataf3l commented 2 months ago

Issue seems to work and be resolved for me.

➜ study mkdir nagisa ➜ study cd nagisa ➜ nagisa python3 -m venv venv source venv/bi% ➜ nagisa source venv/bin/activate (venv) ➜ nagisa pip install nagisa
Collecting nagisa Downloading nagisa-0.2.11-cp312-cp312-macosx_11_0_arm64.whl.metadata (6.6 kB) Collecting six (from nagisa) Using cached six-1.16.0-py2.py3-none-any.whl.metadata (1.8 kB) Collecting numpy (from nagisa) Downloading numpy-2.0.0-cp312-cp312-macosx_14_0_arm64.whl.metadata (60 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.9/60.9 kB 1.2 MB/s eta 0:00:00 Collecting DyNet38 (from nagisa) Downloading dyNET38-2.2-cp312-cp312-macosx_11_0_arm64.whl.metadata (6.5 kB) Collecting cython (from DyNet38->nagisa) Using cached Cython-3.0.10-py2.py3-none-any.whl.metadata (3.2 kB) Downloading nagisa-0.2.11-cp312-cp312-macosx_11_0_arm64.whl (21.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.3/21.3 MB 6.3 MB/s eta 0:00:00 Downloading dyNET38-2.2-cp312-cp312-macosx_11_0_arm64.whl (2.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.9/2.9 MB 6.6 MB/s eta 0:00:00 Downloading numpy-2.0.0-cp312-cp312-macosx_14_0_arm64.whl (5.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.0/5.0 MB 7.8 MB/s eta 0:00:00 Using cached six-1.16.0-py2.py3-none-any.whl (11 kB) Using cached Cython-3.0.10-py2.py3-none-any.whl (1.2 MB) Installing collected packages: six, numpy, cython, DyNet38, nagisa Successfully installed DyNet38-2.2 cython-3.0.10 nagisa-0.2.11 numpy-2.0.0 six-1.16.0 (venv) ➜ nagisa python
Python 3.12.3 (main, Apr 9 2024, 08:09:14) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin Type "help", "copyright", "credits" or "license" for more information.

import nagisa nagisa.tagging("これは何ですか") <nagisa.tagger.Tagger._Token object at 0x100cf2f30> doc = nagisa.tagging("これは何ですか") doc.words ['これ', 'は', '何', 'です', 'か']

my sincere thanks to the team.

dataf3l commented 2 months ago

as things come to my mind, perhaps using chatgpt maybe also solves the problem of POS tagging?

taishi-i commented 2 months ago

Hi @dataf3l. Thank you for checking. I'm glad to hear it worked without any problems. I'm not sure if this answers your question, but you can get part-of-speech tags without using ChatGPT by accessing doc.postags. If you have any questions about retrieving part-of-speech tags, feel free to ask.

import nagisa

doc = nagisa.tagging("これは何ですか")
doc.words
# ['これ', 'は', '何', 'です', 'か']
doc.postags
# ['代名詞', '助詞', '代名詞', '助動詞', '助詞']