Closed mdraves91 closed 1 month ago
Thanks for the detailed report!
'た' is technically a morph in this context, so it's not incorrect per se, the problem is rather that the morph splitting is inconsistent with other verbs.
I also tried both the Mecab and spaCy morphemizers
I suspect that this is an issue with the morphemizers themselves (which I have no control over), not with how the text is being read from Anki, but I'll check once I'm done with v3 :+1:
I just tested this on the morphemizers directly without using Anki, and it still gives the same result, so this is an upstream problem, sorry :/
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Describe the bug
When ankimorphs analyzes Japanese sentences, it splits a lot of conjugated verbs into separate morphs on the character っ.
Steps to reproduce the behavior
Analyze a set of Japanese sentences and you will see many verbs (especially the -te form and -ta form) that are broken into two morphs on the character っ.
For example, the sentence:
considers the verb
静まりかえった
as two morphs,静まりかえっ
andた
when it should just be one. This results in a lot of cards where the am-unknowns field has a verb that cuts off atっ
.In my deck of 1383 sentences, about 168 of them have a verb that is broken like this in the am-unknowns field. (I searched for them with
tag:am-* (am-unknowns:*っ,* OR am-unknowns:*っ)
. there are a small handful of false positives where a valid word did end in っ.Expected behavior
Verbs should parse as just one morph instead of two. Other tools correctly parse the verb, such as the jisho.org dictionary: jisho.org/あたりが静まりかえった。
Screenshots
My AnkiMorphs settings
My system
Additional context
I saw this behavior on the stable version of ankimorphs as well. I also tried both the Mecab and spaCy morphemizers.
I can provide more examples or upload my deck if that would help.