mortii / anki-morphs

A MorphMan fork rebuilt from the ground up with a focus on simplicity, performance, and a codebase with minimal technical debt.
https://mortii.github.io/anki-morphs/
Mozilla Public License 2.0
47 stars 6 forks source link

Anki-morphs keeps showing cards with all known morphs #228

Closed AdhithyaaSaravanan closed 2 months ago

AdhithyaaSaravanan commented 2 months ago

Describe the bug

The title is self explanatory. I keep getting an overwhelming amount of cards with only known morphs. I know the documentation says it gets better with more data, but I've suspended a few hundred cards (mostly stutters and words that are not really words) and also tagged a few hundred as known. The accuracy doesn't seem to improve. The known morph count also keeps increasing a crazy lot because of this, which shouldn't happen (because they're supposed to be already known). Idk if this is relevant, but I switched from Morphman, so I already had a lot of known morphs. This might just be me missing something and being dumb again. In which case, do enlighten me.

Expected behavior

Ideally, It should show me mostly i+1 cards.

Additional context

My setup is the exact same as in the previous issue I opened: https://github.com/mortii/anki-morphs/issues/213 I tried putting off Anki-morphs for a while because of this. But it'll make my life so much easier, and I really want it to work like it should.

mortii commented 2 months ago

I need to know which settings you are using. Go to Tools -> Add-ons -> Select AnkiMorphs -> Click the "Config" button, and then you should see something like:

{
    "filters": [
        {
            "extra_highlighted": true,
            ...
            ...

Copy the entire thing and paste it here.

AdhithyaaSaravanan commented 2 months ago

Here's the whole thing from the Config Button:

{
    "filters": [
        {
            "extra_highlighted": false,
            "extra_score": false,
            "extra_unknowns": false,
            "extra_unknowns_count": false,
            "field": "Expression",
            "modify": false,
            "morph_priority": "ja-freq-anime-mecab.csv",
            "morphemizer_description": "AnkiMorphs: Japanese",
            "note_type": "JLPT Tango N5 Japanese",
            "read": true,
            "tags": {
                "exclude": [],
                "include": []
            }
        },
        {
            "extra_highlighted": false,
            "extra_score": false,
            "extra_unknowns": false,
            "extra_unknowns_count": false,
            "field": "Sentence",
            "modify": false,
            "morph_priority": "ja-freq-anime-mecab.csv",
            "morphemizer_description": "AnkiMorphs: Japanese",
            "note_type": "JP Mining Note",
            "read": true,
            "tags": {
                "exclude": [],
                "include": []
            }
        },
        {
            "extra_highlighted": false,
            "extra_score": false,
            "extra_unknowns": false,
            "extra_unknowns_count": false,
            "field": "Expression",
            "modify": true,
            "morph_priority": "ja-freq-anime-mecab.csv",
            "morphemizer_description": "AnkiMorphs: Japanese",
            "note_type": "Japanese Jo-Mako Morphman Audio",
            "read": true,
            "tags": {
                "exclude": [],
                "include": []
            }
        }
    ],
    "preprocess_ignore_bracket_contents": false,
    "preprocess_ignore_names_morphemizer": true,
    "preprocess_ignore_names_textfile": false,
    "preprocess_ignore_round_bracket_contents": false,
    "preprocess_ignore_slim_round_bracket_contents": false,
    "preprocess_ignore_suspended_cards_content": true,
    "recalc_due_offset": 500000,
    "recalc_interval_for_known": 21,
    "recalc_move_known_new_cards_to_the_end": true,
    "recalc_number_of_morphs_to_offset": 100,
    "recalc_offset_new_cards": false,
    "recalc_on_sync": false,
    "recalc_read_known_morphs_folder": false,
    "recalc_suspend_known_new_cards": false,
    "recalc_toolbar_stats_use_known": false,
    "recalc_toolbar_stats_use_seen": true,
    "recalc_unknowns_field_shows_inflections": true,
    "recalc_unknowns_field_shows_lemmas": false,
    "shortcut_browse_all_same_unknown": "Shift+L",
    "shortcut_browse_ready_same_unknown": "L",
    "shortcut_browse_ready_same_unknown_lemma": "Ctrl+Shift+L",
    "shortcut_frequency_file_generator": "Ctrl+Shift+F",
    "shortcut_generators": "Ctrl+Shift+G",
    "shortcut_known_morphs_exporter": "Ctrl+Shift+E",
    "shortcut_learn_now": "Ctrl+Alt+N",
    "shortcut_readability_report_generator": "Ctrl+Shift+R",
    "shortcut_recalc": "Ctrl+M",
    "shortcut_set_known_and_skip": "K",
    "shortcut_settings": "Ctrl+O",
    "shortcut_view_morphemes": "Ctrl+Alt+V",
    "skip_only_known_morphs_cards": true,
    "skip_show_num_of_skipped_cards": true,
    "skip_unknown_morph_seen_today_cards": true,
    "tag_known_automatically": "am-known-automatically",
    "tag_known_manually": "am-known-manually",
    "tag_learn_card_now": "am-learn-card-now",
    "tag_not_ready": "am-not-ready",
    "tag_ready": "am-ready"
}

I'm happy to provide anymore details.

mortii commented 2 months ago

Okay, so the only cards that have their due order changed are those with the note type "Japanese Jo-Mako Morphman Audio", since it's the only filter that has modify enabled.

You also have these activated, so you shouldn't be seeing any new cards that only have known morphs on them:

"skip_only_known_morphs_cards": true,
"recalc_move_known_new_cards_to_the_end": true,

Do you have any examples of cards that you shouldn't be seeing?

AdhithyaaSaravanan commented 2 months ago

Here are a few examples:

1

2

3

4

I don't know how much context you have, if you haven't learnt Japanese. But words like this are extremely basic and frequent, I'm quite sure they've occurred in a LOT of my previous cards as one of the known morphs. Also I'm ignoring the Morph field because I know Anki-morphs doesn't use it.

mortii commented 2 months ago

Do those cards have the "Japanese Jo-Mako Morphman Audio" note type?

AdhithyaaSaravanan commented 2 months ago

Yeh I picked them out using this expression:

Expression

mortii commented 2 months ago

Okay, remove the "tag:xxx" part of the search, and then find the first card that has an known morph that isn't correctly registered as known.

AdhithyaaSaravanan commented 2 months ago

Ok this is just me scrolling through the first few (new) cards of the note type, and selecting a few, that has all known morphs:

5

7

8

9

Coincidentally they all have the "am-ready" tag, which.........they shouldn't? Also I know I don't have proof that I know these words. You're gonna have to take my word for it lol.

mortii commented 2 months ago

If you don't have sensitive information in your card collection, could you upload it so I can try to reproduce the problem? The "collection.ank2" file found in the anki profile folder would be ideal.

AdhithyaaSaravanan commented 2 months ago

is that file enough? or do you need me to export the decks with scheduling info?

mortii commented 2 months ago

collection.anki2 contains everything except the pictures and audio, so that file has everything I need :+1:

AdhithyaaSaravanan commented 2 months ago

GitHub doesn't seem to support the file type🥲 Is there any other way I can send it to you?

mortii commented 2 months ago

any free file hosting service is fine, google drive, mega, etc.

AdhithyaaSaravanan commented 2 months ago

Here's a link to the collection: https://drive.google.com/file/d/1rusj44f7ccAeTqHl4dZNFXdI1rC1v_ep/view?usp=sharing

Let me know if you need anything else

mortii commented 2 months ago

I applied your settings and recalculated, and it honestly looks fine to me, 進化 is in row 4221 of the frequency file which matches your U and A numbers fairly well.

Screenshot from 2024-04-23 21-04-09

All of those cards are in the "Sub2SRS Bank", and the deck settings are 0 new cards a day. I changed it to 10 just to test it, and clicked "study now" and I got the cards I expected, all of which have rare morphs as their "focus morph", so I don't see anything wrong...

AdhithyaaSaravanan commented 2 months ago

Sorry for the late replay. So yes, some of the cards are i+1, but a lot of them aren't. I went through the next 20 new cards and fair enough, I got 進化 as the first one, which is fine. But I've listed some of the card IDs that contain all known morphs below.

  1. Apparent unknown morph: 飼う New card that treats it as unknown: 1711327789008 Reviewed cards with the morph: 1711327786505, 1713286397652

  2. Apparent unknown morph: 重い New card that treats it as unknown: 1711327792640 Reviewed cards with the morph: 1709860015007, 1588403864544

  3. Apparent unknown morph: 撃つ New card that treats it as unknown: 1711327794070 Reviewed cards with the morph: 1692678690058

  4. Apparent unknown morph: 甘い New card that treats it as unknown: 1711327791376 Reviewed cards with the morph: 1588301331705

I'm pretty sure I got all the "Apparent unknown morphs" correct. Even if I didn't, I'm confident you'll find reviewed cards that contain all the other words in those cards anyway. I've also only specified 4 cards, I think there were a few more, in the 20 cards I tested.

But as I was going through these to pick them out, I did notice that some of the reviewed cards in the "N5 Deck", have formatting in the parsed field, for other purposes. I was wondering maybe that prevented Anki-morphs from being able to parse all the words properly? If that is the case, I'm surprised I didn't have this issue with Morphman.

But in some cases these words were also present in reviewed cards in the other decks, and they have clear sentences in the parsed field, so I'm not sure what's going on.

mortii commented 2 months ago

Ah, I see where the confusion stems from. If you activate the am-unknowns extra field, you will see the "focus morph" that AnkiMorphs determines that the card has, the "Morph" field you have setup is actually misleading.

1711327789008 has the unknown morph 飼っ not 飼う

1711327792640 has 重く not 重い

1711327794070 has 撃っ not 撃つ

1711327791376 has 任せろ not 甘い

AdhithyaaSaravanan commented 2 months ago

Ahh I see....ok so Anki-morphs parses each conjugation of the verb as a separate word, even though they're the same word in dictionary form.

For example, 1. 重く is just the negative stem of 重い (or so I believe I'm still learning 👀). It's basically "heavy" and the stem of "not heavy".

The first 3 words follow the same pattern.

For the 4th unknown morph that I got wrong, I found another reviewed card with the right morph: 1609263430437, but I guess technically it's "unknown" to Anki-morph because it's a different conjugation?

So is there a way to mark all forms of a word as known when you encounter it once in any form? For eg, If I reviewed the verb "to eat", words like "ate" and "eaten" would all be considered known.

I found these settings in Anki and in the docs:

image

But the way it's worded, it sounds like it'll only change the way Anki-morphs shows the word, and doesn't affect the actual algorithm. But is this what I'm looking for?

mortii commented 2 months ago

So is there a way to mark all forms of a word as known when you encounter it once in any form?

This will be added in AnkiMorphs v3, we are discussing it in #191.

But the way it's worded, it sounds like it'll only change the way Anki-morphs shows the word, and doesn't affect the actual algorithm.

That's right, it only changes how it is displayed visually, it won't change the "unknowns".

For example, 1. 重く is just the negative stem of 重い (or so I believe I'm still learning 👀). It's basically "heavy" and the stem of "not heavy".

Ah, no, that is wrong in a fundamental way. This is actually why I recommend not using the option that will be introduced in v3, because errors like this will fall through the cracks, and you will then be left with a false sense of competence.

AdhithyaaSaravanan commented 2 months ago

Yeah I had a feeling I was wrong lol, that's why I put a disclaimer.

I’m gonna briefly explain why I’d still want it the way I want.

I don’t think it’s really necessary to study grammar consciously, especially in Anki, to acquire all aspects of a language. Why? Because I acquired English 100% through raw immersion and my grammar isn’t half bad now. (It’s not my 1st language, and I wasn’t in an English-speaking environment until I was an adult).

This is why my biggest goal is increasing comprehension as fast as possible.

My timeline is:

Learn the most common words asap -> more comprehension -> more motivation to immerse -> more gains

So basically, replicating the same thing I did for English.

I’m also not really skipping grammar, because I still mine conjugated words that I don’t understand (that I’m supposed to know the lemma of). Or look up grammar points as I go.

So, for the above-mentioned reasons, I only rate cards based on comprehension. My primary source of cards is immersion, Anki-morphs is just to get a few extra cards sometimes. So, it’s especially frustrating to skip through 50 cards to get the 5 cards I want, since I want to reduce Anki time and increase immersion.

I don’t disagree with you per se, my intuition for Japanese grammar is obviously still very low, but I just started immersing 3 months ago, and it can only get better with more time / exposure.

So, I’m definitely interested in Anki-morphs v3. So it’d be nice to know the timeline for it? Also wanna thank you again for being so active and helping the language learning community. I’d be so lost as to why this was happening, if not for this long ass convo 😭

mortii commented 2 months ago

So it’d be nice to know the timeline for it?

Probably somewhere between 2-4 months.

I'll close this issue now since we managed to resolve it. Best of luck in your studies!

github-actions[bot] commented 2 months ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.