mortii / anki-morphs

A MorphMan fork rebuilt from the ground up with a focus on simplicity, performance, and a codebase with minimal technical debt.
https://mortii.github.io/anki-morphs/
Mozilla Public License 2.0
47 stars 6 forks source link

ankimorphs-japanese-mecab known morphs count highly inaccurate #213

Closed AdhithyaaSaravanan closed 3 months ago

AdhithyaaSaravanan commented 3 months ago

Describe the bug

I recently switched to Ankimorphs and ankimorphs-japanese-mecab, and my known morphs count (2.2k in the original mecab addon) went up to 7k. I've played around and tried analysing various note types. It seems it's only parsing 1 of them wrong.

Steps to reproduce the behavior

I don't know if it can be reproduced since it seems highly specific to my Anki collection. I've got 3 note types I wanna parse:

The note type that (I think) it gets wrong is the "Japanese Jo-Mako Morphman Audio". I've included screenshots below to explain it the best I can.

Expected behavior

Ideally, the known morphs count would be almost the same as parsed by the original mecab addon. An error margin of + or - a hundred is understandable.

Screenshots

The original Morphman with the Mecab addon:

All 3 note types parsed: All_3

Only N5 and JP Mining Note: Excluding_Jo_Mako

Only N5: Only_N5

Ankimorphs with ankimorphs-japanese-mecab:

All 3 note types parsed: All_3

Only N5 and JP Mining Note: Excluding_Jo_Mako

Only N5: Only_N5

My setup

Additional context

I know this issue seems highly specific, so I'm happy to provide more context if needed. I also didn't know if this is the right place to post the bug? I'm sorry if I was supposed to post it in the ankimorphs-japanese-mecab GitHub issues page. I did it here because this page seemed more active. Thank you for your time!

mortii commented 3 months ago

@AdhithyaaSaravanan thanks for the feedback! The discrepancy definitely does look too big.

I fixed a bug in the mecab morphemizer which makes it recognize more morphs than MorphMan: https://github.com/mortii/anki-morphs/blob/4569ee9e36438ab929fbd9f65a80104bac993313/ankimorphs/mecab_wrapper.py#L117-L120

but probably not to the extent you are seeing.

I suspect that this might be due to a previous recalc you did using different settings, which gave a lot of cards the am-known-automatically tags, which then makes AnkiMorphs think you know a lot more morphs than you actually do.

What happens if you remove the am-known-automatically tag from all your cards and then run recalc again? (You can do this by going to Browse -> select all your cards -> right click the am-known-automatically tag in the left sidebar -> remove from all selected notes)

AdhithyaaSaravanan commented 3 months ago

You were spot on! Solved

I had a feeling I was doing something dumb. And it looks like I know 200 more words than I thought lol.

Also do you have anymore advice on moving from Morphman to Ankimorphs? I have a lot cards with the "mm_alreadyKnown" tag which are technically new. I'm assuming Ankimorph doesn't take this into account automatically. Any other things I should be aware of?

Was also wondering if anything would break if I switch between Morphman and Ankimorphs if I ever wanted to. (Ik Morphman is broken, I have weird reasons)

mortii commented 3 months ago

Great! You are far from the only person that has encountered that problem, so we are working on making it easier to fix (#209).

There is a section in the guide about transitioning from morphman which might be useful: https://mortii.github.io/anki-morphs/user_guide/faq.html#transitioning-from-morphman

Let me know if you have any other problems or suggestions!

AdhithyaaSaravanan commented 3 months ago

Thnx! Also thnx for all your service to the community (for free!!!). Much appreciated.

github-actions[bot] commented 2 months ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.