mortii / anki-morphs

A MorphMan fork rebuilt from the ground up with a focus on simplicity, performance, and a codebase with minimal technical debt.
https://mortii.github.io/anki-morphs/
GNU Affero General Public License v3.0
61 stars 9 forks source link

Chinese spacy models and spacy-pkuseg #183

Closed ashprice closed 8 months ago

ashprice commented 8 months ago

Previously, I never saw this, when using Chinese models (I have just a handful of cards). I'm not sure if this is an issue stemming from something changing in spacy or ankimorphs, or my local system, but I now get an error where before I did not.

This issue is kind of just a PSA, I guess - I'm not sure it even belongs here, but I am wondering if anyone else has had it. I think the issue is that the models are missing, but I don't know enough about python packaging to know where to put them, and I get the same errors even if using the pip package that is meant to have the models.

To Reproduce Steps to reproduce the behavior:

  1. Set up Ankimorphs to use something like zh_core_web_lg
  2. Click on recalc
  3. See error
Anki 24.04 (8c9d7d64) (src) (ao)
Python 3.11.8 Qt 6.6.2 PyQt 6.6.1
Platform: Linux-6.7.9-arch1-1-x86_64-with-glibc2.39

Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/aqt/taskman.py", line 142, in _on_closures_pending
    closure()
  File "/usr/lib/python3.11/site-packages/aqt/taskman.py", line 86, in <lambda>
    lambda future: self.run_on_main(lambda: on_done(future))
                                            ^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/aqt/taskman.py", line 106, in wrapped_done
    on_done(fut)
  File "/usr/lib/python3.11/site-packages/aqt/operations/__init__.py", line 252, in wrapped_done
    self._failure(exception)
  File "/home/hearth/.local/share/Anki2/addons21/472573498/recalc.py", line 865, in _on_failure
    raise error
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/aqt/operations/__init__.py", line 242, in wrapped_op
    return self._op(mw.col)
           ^^^^^^^^^^^^^^^^
  File "/home/hearth/.local/share/Anki2/addons21/472573498/recalc.py", line 85, in _recalc_background_op
    _cache_anki_data(am_config)
  File "/home/hearth/.local/share/Anki2/addons21/472573498/recalc.py", line 152, in _cache_anki_data
    nlp = spacy_wrapper.get_nlp(spacy_model)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hearth/.local/share/Anki2/addons21/472573498/spacy_wrapper.py", line 116, in get_nlp
    nlp = spacy.load(spacy_model_name)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/spacy/__init__.py", line 51, in load
    return util.load_model(
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/spacy/util.py", line 465, in load_model
    return load_model_from_package(name, **kwargs)  # type: ignore[arg-type]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/spacy/util.py", line 501, in load_model_from_package
    return cls.load(vocab=vocab, disable=disable, enable=enable, exclude=exclude, config=config)  # type: ignore[attr-defined]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/zh_core_web_lg/__init__.py", line 10, in load
    return load_model_from_init_py(__file__, **overrides)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/spacy/util.py", line 682, in load_model_from_init_py
    return load_model_from_path(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/spacy/util.py", line 547, in load_model_from_path
    return nlp.from_disk(model_path, exclude=exclude, overrides=overrides)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/spacy/language.py", line 2206, in from_disk
    util.from_disk(path, deserializers, exclude)  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/spacy/util.py", line 1390, in from_disk
    reader(path / key)
  File "/usr/lib/python3.11/site-packages/spacy/language.py", line 2192, in <lambda>
    deserializers["tokenizer"] = lambda p: self.tokenizer.from_disk(  # type: ignore[union-attr]
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/spacy/lang/zh/__init__.py", line 283, in from_disk
    util.from_disk(path, serializers, [])
  File "/usr/lib/python3.11/site-packages/spacy/util.py", line 1390, in from_disk
    reader(path / key)
  File "/usr/lib/python3.11/site-packages/spacy/lang/zh/__init__.py", line 280, in <lambda>
    "pkuseg_model": lambda p: load_pkuseg_model(p),
                              ^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/spacy/lang/zh/__init__.py", line 262, in load_pkuseg_model
    self.pkuseg_seg = spacy_pkuseg.pkuseg(path)
                      ^^^^^^^^^^^^^^^^^^^
AttributeError: module 'spacy_pkuseg' has no attribute 'pkuseg'

Expected behavior Successful recalc.

Desktop (please complete the following information): OS: Arch linux (6.7.9-arch1-1). Ankimorphs version: 1.3.0.

In trying to solve this I have:

  1. Installed both pkuseg-python and spacy-pkuseg manually, a step that I didn't have to do to get it to work previously. (I have an AUR package ready to go for the latter if I can figure out how to get this to work.)
  2. Done the same in a venv. Everything here has the same issue whether in .venv or not!
  3. Also tried installing the pip version, which supposedly(!) includes the model.
  4. Done some manual testing by downloading the models myself and trying to point it at them.

As for (4), always the same result:

>>> import pkuseg
>>> seg = pkuseg.pkuseg(model_name='./default')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'pkuseg' has no attribute 'pkuseg'

I am assuming I am making some really dumb mistake like importing the model wrong, but I can't figure this out. I should maybe make an issue on the spacy-pkuseg repo, but I haven't as of yet because I am kind of guessing this might be user error - can anyone else reproduce this?

For now I'm just gonna ignore my handful of cards :)

Thanks for your time. :+1:

mortii commented 8 months ago

I just downloaded a fresh "zh_core_web_lg" and it works fine for me (ubuntu, python 3.9), so maybe it's an arch thing?

xofm31 commented 8 months ago

I basically copy-and-pasted from the Ankimorphs spaCy directions [https://mortii.github.io/anki-morphs/user_guide/installation/installing-spacy.html#:~:text=macOS-,First%2C%20we%20need%20to%20have%20Python%203.9%20on%20our%20system.%20Go,Now%20those%20spaCy%20models%20should%20be%20available%20as%20morphemizers%20in%20AnkiMorphs!,-Linux] , downloaded "zh_core_web_lg", and had no problems.

I'm using: Version ⁨23.12.1 (1a1d4d54)⁩ Python 3.9.15 Qt 6.5.3 PyQt 6.5.3

ashprice commented 8 months ago

I made an obvious mistake - I forgot to specify the correct version for spacy. I guess I assumed because the package is tied to spacy that the current version would work? Either that or that wasn't the issue, but reinstalling everything one more time fixed it somehow.

github-actions[bot] commented 7 months ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.