kaegi / MorphMan

Anki plugin that reorders language cards based on the words you know
Other
260 stars 66 forks source link

Add support for spaCy. #231

Closed rteabeault closed 1 year ago

rteabeault commented 3 years ago

Fixes #162

rteabeault commented 3 years ago

@ianki, @thinkingbox12 New pull request for spaCy support. Tests are passing locally for me but failing the automated build. I will look into that shortly but I wanted you both to take a look at this.

nlovell1 commented 3 years ago

Sorry for the delayed reply. It seems to me that this new version is not communicating with the spacy addon. When going into morphman to change the preferences for recalc, it could not see the installed models. I tried this on a fresh profile as well.

Let me know if there's anything else besides that specifically you want checked.

rteabeault commented 3 years ago

@thinkingbox12 did you read the instructions in the readme? This does not use the spacy addon that I wrote.

nlovell1 commented 3 years ago

Nope. My fault. Will read the readme and try again tomorrow.

rteabeault commented 3 years ago

I have installed ubuntu 18.04 and python 3.7 and am still unable to reproduce this test failure. I will continue to investigate.

nlovell1 commented 3 years ago

Will try ubuntu today. Got caught up with other things, sorry about the delay.

nlovell1 commented 3 years ago

Well, everything seems to be working fine for me. Ubuntu 20.04.1 LTS 64bit etc... using Python 3.8.5 Could download the model properly and link properly in terminal. Not sure if this has anything to do with it at all, but I kept the old SpaCy package manager in the profile. Don't think it makes a difference though because obviously, python couldn't see my prior installed models through the Spacy Package manager. Could recalc properly with a few Japanese notes, morph count updated. Reading known.db also made sense to me. TLDR everything good on my end. Sorry again for the delay.

rteabeault commented 3 years ago

Tests fixed. @ianki Can you please take a look? Thanks!

nlovell1 commented 3 years ago

Any updates on merging this into the default MorphMan version?

nlovell1 commented 3 years ago

I'm getting this exception after a new Install to windows after a while. @rteabeault any guesses?

Error
An error occurred. Please start Anki while holding down the shift key, which will temporarily disable the add-ons you have installed.
If the issue only occurs when add-ons are enabled, please use the Tools > Add-ons menu item to disable some add-ons and restart Anki, repeating until you discover the add-on that is causing the problem.
When you've discovered the add-on that is causing the problem, please report the issue on the add-on support site.
Debug info:
Anki 2.1.35 (84dcaa86) Python 3.8.0 Qt 5.14.2 PyQt 5.14.2
Platform: Windows 10
Flags: frz=True ao=True sv=1
Add-ons, last update check: 2021-03-18 23:12:43

Caught exception:
Traceback (most recent call last):
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\__init__.py", line 17, in onMorphManRecalc
    main.main()
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\main.py", line 573, in main
    allDb = mkAllDb(cur)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\main.py", line 195, in mkAllDb
    ms = getMorphemes(morphemizer, fieldValue, ts)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\morphemes.py", line 166, in getMorphemes
    ms = morphemizer.getMorphemesFromExpr(expression)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\morphemizer.py", line 51, in getMorphemesFromExpr
    morphs = self._getMorphemesFromExpr(expression)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\deps\spacy\morphemizer.py", line 40, in _getMorphemesFromExpr
    self.proc.stdin.flush()
OSError: [Errno 22] Invalid argument
nlovell1 commented 3 years ago

Another exception, either getting this one or the last one. tried reinstalling spacy and models many times, with no luck. Is SpaCy still in interest of being developed? I've been looking into some cool Japanese features in the meantime.

Caught exception:
Traceback (most recent call last):
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\__init__.py", line 17, in onMorphManRecalc
    main.main()
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\main.py", line 573, in main
    allDb = mkAllDb(cur)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\main.py", line 195, in mkAllDb
    ms = getMorphemes(morphemizer, fieldValue, ts)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\morphemes.py", line 166, in getMorphemes
    ms = morphemizer.getMorphemesFromExpr(expression)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\morphemizer.py", line 51, in getMorphemesFromExpr
    morphs = self._getMorphemesFromExpr(expression)
  File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\deps\spacy\morphemizer.py", line 41, in _getMorphemesFromExpr
    morphs = json.loads(self.proc.stdout.readline())
  File "json\__init__.py", line 357, in loads
  File "json\decoder.py", line 337, in decode
  File "json\decoder.py", line 355, in raw_decode
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
nlovell1 commented 3 years ago

@rteabeault The problem above results from an encoding problem to terminal (on the Japanese model, specifically) in Windows. My instinct tells me this behavior has resulted from a new Windows feature update- the terminal effectively is displaying Japanese characters as the Unicode 'unknown character' glyph, so when they get passed through to SudachiPy, it fails, and an exception results.

I am suspecting that changing the region and locale to Japan so that the terminal supports UTF-8 and Japanese glyphs might solve the problem, but this has not been tested yet, and is probably an ineffective solution for most users of this addon.

The most recent version of this repo works just fine on Ubuntu.

I am interested in development for Spacy 3.0, which might simplify the link process, as it was revamped and considered obsolete. AFAIK some of the syntax is changed slightly, and doesn't work currently.

EDIT

Oddly enough though, on Ubuntu, when upgrading from sudachipy 0.4.5 (which worked) to 0.4.9, I got the same exception that I did on Windows. Upgrading once again on Ubuntu to 0.5.2 resolved the issue. Is this coincidental?

RawToast commented 3 years ago

Just wondering, if Spacey provides better analysis than MeCab then perhaps it would be better as a new add-on? I almost see Morphman as an add-on for Japanese and not other languages (that all came later).

There's a lot in this repo and maybe by replacing MeCab and only using Spacey lots of code could be removed and the codebase simplified?

ianki commented 2 years ago

Hey guys, sorry for the long wait on this. What's the current state of this support? Should I look to merge this?

ianki commented 2 years ago

I was able to rebase this, and it seems to work OK after fixing handling of new lines in the expressions.

ghost commented 2 years ago

Hey all...What do I have to do to merge this into my morphman installation?

Vilhelm-Ian commented 1 year ago

Just wondering, if Spacey provides better analysis than MeCab then perhaps it would be better as a new add-on? I almost see Morphman as an add-on for Japanese and not other languages (that all came later).

There's a lot in this repo and maybe by replacing MeCab and only using Spacey lots of code could be removed and the codebase simplified?

This is not true. People use morphman for other languages and there is no reason why they shouldn't benefit for Spacy