rteabeault / AnkiSpacy

5 stars 1 forks source link

Issues with Japanese model installation #7

Open nlovell1 opened 3 years ago

nlovell1 commented 3 years ago

See this reply: https://github.com/kaegi/MorphMan/pull/221#issuecomment-754379723

rteabeault commented 3 years ago

Thanks so much for taking a look at this @thinkingbox12. I am currently trying to reproduce but it seems that on the latest version of Windows there is an issue with one of Spacy's dependencies numpy that I need to work past.

The current Numpy installation fails to pass a sanity check due to a bug in the windows runtime. See this issue for more information: https://tinyurl.com/y3dm3h86

This appears to be an issue that is in the pipeline of being fixed by Microsoft. If I can work around it I will try to reproduce your issue. At first glance the error you are getting appears to be something missing on your Windows machine that is needed by sudachi

ImportError: DLL load failed while importing _dartsclone: The specified module could not be found.

I will report back as soon as I figure something out.

nlovell1 commented 3 years ago

Had a little bit of time to get you the error I was having on Ubuntu when trying to install Japanese (large) model. Let me know if theres anything else I can do.

running build_ext
cythoning sudachipy/latticenode.pyx to sudachipy/latticenode.c
/home/.local/share/Anki2/addons21/src/user_files/packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-install-1ydi_wzr/sudachipy_07a9ba150165499b926687bb1b596868/sudachipy/latticenode.pxd
  tree = Parsing.p_module(s, pxd, full_module_name)
cythoning sudachipy/lattice.pyx to sudachipy/lattice.c
/home/.local/share/Anki2/addons21/src/user_files/packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-install-1ydi_wzr/sudachipy_07a9ba150165499b926687bb1b596868/sudachipy/lattice.pxd
  tree = Parsing.p_module(s, pxd, full_module_name)
cythoning sudachipy/tokenizer.pyx to sudachipy/tokenizer.c
/home/.local/share/Anki2/addons21/src/user_files/packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-install-1ydi_wzr/sudachipy_07a9ba150165499b926687bb1b596868/sudachipy/tokenizer.pyx
  tree = Parsing.p_module(s, pxd, full_module_name)
building 'sudachipy.latticenode' extension
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/sudachipy
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/share/anki/bin/include/python3.8 -c sudachipy/latticenode.c -o build/temp.linux-x86_64-3.8/sudachipy/latticenode.o
ERROR: Exception:
Traceback (most recent call last):
  File "/home/.local/share/Anki2/addons21/src/_vendor/distutils/unixccompiler.py", line 117, in _compile
    self.spawn(compiler_so + cc_args + [src, '-o', obj] +
  File "distutils/ccompiler.py", line 910, in spawn
  File "distutils/spawn.py", line 36, in spawn
  File "distutils/spawn.py", line 157, in _spawn_posix
distutils.errors.DistutilsExecError: command 'gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/.local/share/Anki2/addons21/src/_vendor/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/home/.local/share/Anki2/addons21/src/_vendor/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/home/.local/share/Anki2/addons21/src/_vendor/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/.local/share/Anki2/addons21/src/_vendor/setuptools/command/install.py", line 61, in run
    return orig.install.run(self)
  File "/home/.local/share/Anki2/addons21/src/_vendor/distutils/command/install.py", line 545, in run
    self.run_command('build')
  File "/home/.local/share/Anki2/addons21/src/_vendor/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/.local/share/Anki2/addons21/src/_vendor/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/.local/share/Anki2/addons21/src/_vendor/distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/home/local/share/Anki2/addons21/src/_vendor/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/.local/share/Anki2/addons21/src/_vendor/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/.local/share/Anki2/addons21/src/_vendor/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/home/.local/share/Anki2/addons21/src/user_files/packages/Cython/Distutils/old_build_ext.py", line 186, in run
    _build_ext.build_ext.run(self)
  File "/home/.local/share/Anki2/addons21/src/_vendor/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/home/.local/share/Anki2/addons21/src/user_files/packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
    _build_ext.build_ext.build_extensions(self)
  File "/home/.local/share/Anki2/addons21/src/_vendor/distutils/command/build_ext.py", line 449, in build_extensions
    self._build_extensions_serial()
  File "/home/.local/share/Anki2/addons21/src/_vendor/distutils/command/build_ext.py", line 474, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/local/share/Anki2/addons21/src/_vendor/setuptools/command/build_ext.py", line 196, in build_extension
    _build_ext.build_extension(self, ext)
  File "/home/.local/share/Anki2/addons21/src/_vendor/distutils/command/build_ext.py", line 528, in build_extension
    objects = self.compiler.compile(sources,
  File "distutils/ccompiler.py", line 574, in compile
  File "/home/.local/share/Anki2/addons21/src/_vendor/distutils/unixccompiler.py", line 120, in _compile
    raise CompileError(msg)
distutils.errors.CompileError: command 'gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/local/share/Anki2/addons21/src/_vendor/pip/_internal/cli/base_command.py", line 224, in _main
    status = self.run(options, args)
  File "/home/.local/share/Anki2/addons21/src/_vendor/pip/_internal/cli/req_command.py", line 180, in wrapper
    return func(self, options, args)
  File "/home/.local/share/Anki2/addons21/src/_vendor/pip/_internal/commands/install.py", line 394, in run
    installed = install_given_reqs(
  File "/home/.local/share/Anki2/addons21/src/_vendor/pip/_internal/req/__init__.py", line 82, in install_given_reqs
    requirement.install(
  File "/home/.local/share/Anki2/addons21/src/_vendor/pip/_internal/req/req_install.py", line 840, in install
    success = install_legacy(
  File "/home/.local/share/Anki2/addons21/src/_vendor/pip/_internal/operations/install/legacy.py", line 95, in install
    exec(theargs, globals(), globals())
  File "<string>", line 1, in <module>
  File "/tmp/pip-install-1ydi_wzr/sudachipy_07a9ba150165499b926687bb1b596868/setup.py", line 25, in <module>
    setup(name="SudachiPy",
  File "/home/.local/share/Anki2/addons21/src/_vendor/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/home/.local/share/Anki2/addons21/src/_vendor/distutils/core.py", line 163, in setup
    raise SystemExit("error: " + str(msg))
SystemExit: error: command 'gcc' failed with exit status 1
rteabeault commented 3 years ago

Oh that looks fun. So gcc is failing to compile sudachi on linux. Both of these issues seem to not necessarily be problems with this Anki addon but with OS related issues. It would be good to figure out how to workaround them though and document.

nlovell1 commented 3 years ago

I'd love to help. Where do I begin? Can also test on a Mac tomorrow.

rteabeault commented 3 years ago

Ok. I have seen three problems so far and here are the workarounds

  1. Windows only: There is an issue with spacy's dependency numpy and windows. After installing a spacy model on windows (if you have the windows October Update 2004) you may see

    The current Numpy installation ('<some_path_to_numpy_init_file>') fails to pass a sanity check due to a bug in the windows runtime. See this issue for more information: https://tinyurl.com/y3dm3h86

    To fix this open some_path_to_numpy_init_file seen in the error. Replace the line

    if sys.platform == "win32" and sys.maxsize > 2**32:
      _win_os_check()

    with

    if sys.platform == "win32" and sys.maxsize > 2**32:
       #  _win_os_check()
      pass

    Once https://developercommunity.visualstudio.com/content/problem/1207405/fmod-after-an-update-to-windows-2004-is-causing-a.html is resolved this step should be unnecessary.

  2. Windows Only: You may see the error

    ImportError: Japanese support requires SudachiPy and SudachiDict-core (https://github.com/WorksApplications/SudachiPy). Install with `pip install sudachipy sudachidict_core` or install spaCy with `pip install spacy[ja]`.

    And further up in the error message you see

    ImportError: DLL load failed while importing _dartsclone: The specified module could not be found.

    This can be fixed by installing the visual c++ redistributable.

    1. Go to https://visualstudio.microsoft.com/downloads/
    2. At the bottom select image
    3. Download and install image for your system
  3. Windows only: When running Morphman recalc you see the following error

    OSError: [WinError 1314] A required privilege is not held by the client: 'C:\\workspace\\AnkiSpacy\\src\\user_files\\packages\\sudachidict_core' -> 'C:\\workspace\\AnkiSpacy\\src\\user_files\\packages\\sudachidict'

    Further up in the error you will also see

      File "C:\workspace\AnkiSpacy\src\user_files\packages\sudachipy\config.py", line 56, in create_default_link_for_sudachidict_core
    dict_path = Path(import_module('sudachidict').__file__).parent
    File "importlib\__init__.py", line 127, in import_module
    File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
    File "<frozen importlib._bootstrap>", line 991, in _find_and_load
    File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
    ModuleNotFoundError: No module named 'sudachidict'

    This seems to be sudachi trying to create a symlink to the installed sudachidict (in this case, by default it is sudachidict_core). I thought restarting Anki running as administrator would fix the problem but Morphman then disappeared from my Anki tools. I will try to figure that out tomorrow. In the meantime you can create a symlink by

    1. Start a command prompt as administrator.
    2. Create a symlink from sudachidict to sudachidict_core
      mklink /D <anki_spacy_addon_path>packages\sudachidict <anki_spacy_addon_path>\packages\sudachidict_core

Tomorrow I will look into the linux compiler issue.

I'd love to help. Where do I begin? Can also test on a Mac tomorrow.

See if you can get the Japanese working with the workarounds here. After that whatever things you can test out are appreciated. Chinese may be good because I believe it also requires extra 3rd party dependencies like Japanese.

rteabeault commented 3 years ago

Did not mean to close this. Reopen.

nlovell1 commented 3 years ago

Japanese small model (windows) worked after these fixes (install and recalc). Went to install medium model right after. Got this exception

Traceback (most recent call last):
  File "C:\Users\AppData\Roaming\Anki2\addons21\src\_vendor\pip\_internal\cli\base_command.py", line 224, in _main
    status = self.run(options, args)
  File "C:\Users\AppData\Roaming\Anki2\addons21\src\_vendor\pip\_internal\cli\req_command.py", line 180, in wrapper
    return func(self, options, args)
  File "C:\Users\AppData\Roaming\Anki2\addons21\src\_vendor\pip\_internal\commands\install.py", line 452, in run
    self._handle_target_dir(
  File "C:\Users\AppData\Roaming\Anki2\addons21\src\_vendor\pip\_internal\commands\install.py", line 505, in _handle_target_dir
    shutil.rmtree(target_item_dir)
  File "shutil.py", line 730, in rmtree
  File "shutil.py", line 608, in _rmtree_unsafe
  File "shutil.py", line 606, in _rmtree_unsafe
PermissionError: [WinError 5] Access is denied: 'C:\\Users\\AppData\\Roaming\\Anki2\\addons21\\src\\user_files\\packages\\dartsclone\\_dartsclone.cp38-win_amd64.pyd'

I noticed it reinstalls all the packages for each model (cython, sortedcontainers, dartsclone, sudachipy). Just something to note

  1. Tried again to reinstall large model, and the previous error was solved by running Anki as admin.

An immediate recalc (large model) resulted in this error:

Anki 2.1.35 (84dcaa86) Python 3.8.0 Qt 5.14.2 PyQt 5.14.2
Platform: Windows 10
Flags: frz=True ao=True sv=1
Add-ons, last update check: 2021-01-04 22:26:33

Caught exception:
Traceback (most recent call last):
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\__init__.py", line 20, in onMorphManRecalc
    main.main()
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\main.py", line 573, in main
    allDb = mkAllDb(cur)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\main.py", line 195, in mkAllDb
    ms = getMorphemes(morphemizer, fieldValue, ts)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\morphemes.py", line 166, in getMorphemes
    ms = morphemizer.getMorphemesFromExpr(expression)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\morphemizer.py", line 52, in getMorphemesFromExpr
    morphs = self._getMorphemesFromExpr(expression)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\deps\spacy\morphemizer.py", line 21, in _getMorphemesFromExpr
    self.nlp = spacy.load(self.model_path)
  File "C:\Users\AppData\Roaming\Anki2\addons21\src\user_files\packages\spacy\__init__.py", line 30, in load
    return util.load_model(name, **overrides)
  File "C:\Users\AppData\Roaming\Anki2\addons21\src\user_files\packages\spacy\util.py", line 172, in load_model
    return load_model_from_path(Path(name), **overrides)
  File "C:\Users\AppData\Roaming\Anki2\addons21\src\user_files\packages\spacy\util.py", line 220, in load_model_from_path
    component = nlp.create_pipe(factory, config=config)
  File "C:\Users\AppData\Roaming\Anki2\addons21\src\user_files\packages\spacy\language.py", line 310, in create_pipe
    raise KeyError(Errors.E002.format(name=name))
KeyError: "[E002] Can't find factory for 'parser'. This usually happens when spaCy calls `nlp.create_pipe` with a component name that's not built in - for example, when constructing the pipeline from a model's meta.json. If you're using a custom component, you can write to `Language.factories['parser']` or remove it from the model meta and add it via `nlp.add_pipe` instead."

However, after a restart of Anki, I could again recalc using large model. Why is this behaviour occuring?

rteabeault commented 3 years ago

PermissionError: [WinError 5] Access is denied: 'C:\Users\AppData\Roaming\Anki2\addons21\src\user_files\packages\dartsclone\_dartsclone.cp38-win_amd64.pyd'�[0m

Do you get this every time you repeat the same steps? I am wondering if it is similar to the permissions problem for symlinks. If you can reproduce this can you enable Developer Mode in windows and see if that fixes it.

Settings -> Update & Security -> For Developers -> Developer Mode

I noticed it reinstalls all the packages for each model (cython, sortedcontainers, dartsclone, sudachipy). Just something to note

This is known. The code to install via pip uses the -t and --upgrade options. In this case it always tries to install the packages even if they are already there. It is unfortunate but I could not find a better way to do it.

KeyError: "[E002] Can't find factory for 'parser'. This usually happens when spaCy calls nlp.create_pipe with a component name that's not built in - for example, when constructing the pipeline from a model's meta.json. If you're using a custom component, you can write to Language.factories['parser'] or remove it from the model meta and add it via nlp.add_pipe instead."

I am not sure what is happening here. I will try to reproduce this.

rteabeault commented 3 years ago

PermissionError: [WinError 5] Access is denied: 'C:\Users\AppData\Roaming\Anki2\addons21\src\user_files\packages\dartsclone_dartsclone.cp38-win_amd64.pyd'

I have reproduced this. If you install ja_core_news_md and recalc and then try to install any other model that has a dependency on dartsclone you will see this error. I believe it is because the file listed in the error is still being held open by the anki process. If you restart anki and try to install it again you won't see this error. I believe though that even when you see the error the model has properly been installed. You just need to restart. This is a byproduct of what you astutely observed that dependencies are reinstalled for each model. It is unfortunate. I will try to think of a fix but at this point I am not sure how to resolve this without restarting anki.

rteabeault commented 3 years ago

I continue to run into problems. All around Chinese and Japanese and most in Windows. For example, it becomes impossible to remove some models or spacy if Morphman has already loaded the packages. This is because Windows will not allow you to delete files in use and anki has already loaded some DLLs from the packages. My intent all along was to create an easy to use package manager that did not require the user to install python. But trying to install packages and load them while anki is already running is turning out to be a large pain. I am partially considering ditching the GUI package manager and instead just giving users instructions on how to install packages via pip.

nlovell1 commented 3 years ago

I don't think it would be all that bad. I would imagine the users that would benefit from the more precise Japanese parsing wouldn't mind running cmd.

I'd like to start investigating the other issue I posted. I am a beginner to troubleshooting/testing, what exactly does setting up a dev environment consist of and how is it different from just running Anki as a normal user?

rteabeault commented 3 years ago

The simplest thing to do is to copy the contents of <this_repo>/src to <your_anki_home>/addons21/AnkiSpacy. You can then just modify the code and run anki as normal to test. Alternatively you can symlink from your addons21 directory to wherever src is. Let me know if you have any questions.

nlovell1 commented 3 years ago

Okay. I think I've centered in on a problem I was having. For whatever reason, when using any Spacy-based morphemizer when recalculating cards (after generating a frequency list from a study plan in the Readability Analyzer), it is not reading from the frequency.txt file, and therefore no cards are tagged with the 'mm_FrequencyList' tag. It seems that this error occurs regardless of the morphemizer that I use, as it fails with both MeCab and Spacy. Moreover, Mecab has no trouble reading from a frequency.txt generated by Spacy. So I've no idea for what reason this is occurring, because if Spacy was broken it wouldn't recalc properly at all, it just seems there's a bug with reading frequency.txt and tagging cards/ reordering their priority as such. What do you think?

nlovell1 commented 3 years ago

Ok. It seems the problem is, when using Spacy, the master_freq counter in frequency.txt is not updating. Still unsure as to why.

nlovell1 commented 3 years ago

Update #2. The new models (seem?) to be working when the frequency.txt is generated with the readability tool WITH THE MODEL that you want to read it with. This means previously generated frequency lists (typically made with MeCab) are incompatible. I'm not sure why this is. Probably something to do with how spacy lists results/classifies parses, but I haven't figured it out further.

Despite this, morphman does not tag whether a card is on your frequency list or not.

Additionally, it feels that the cards aren't adhering to the study plan as well as I might have remembered (this might totally be me though, I wasn't particularly familiar with the scoring algorithm before I looked at the code, so maybe it's working the same way as before).