Check if kana answers contain the kana that are already part of the vocabulary

dlsf commented 9 months ago

Description This is just a personal gripe of mine with Wanikani, but I believe that the app shouldn't accept answers if a user accidentally mistyped something that was already provided in the question.

For example: 夏休み should accept なつやすみ and fail on なつやつみ, but なつやすむ (and everything else not ending in み) should give the user the chance to retry since they clearly just misspelled the last character that has been provided by Wanikani already.

For simplicity's sake, I didn't add a config setting for this behavior, but I can modify the PR if requested. If it's not something that you believe should be added to the app (since it's not a website feature), then that's perfectly fine, but I think it would be a positive addition that could help make reviews less frustrating.

Implementation My implementation makes use of Regex, but the speed shouldn't be an issue here — in fact, you could argue that the for loop just above this addition could be replaced by currentAnswer.matches("(?:\\p{Script=Katakana}|\\p{Script=Hiragana})+") as well for better readability. However, that's beyond the scope of this PR.

We take the word that is being asked for (Kanji + Kana) and replace all Kanji greedily with .+. If we match the user's answer to this new regex string, we can effectively check if everything that's not part of the Kanji's reading is still present in the expected order. What the user replaced the Kanji with is not relevant to us as long as it's not nothing.

Update (more info in the comments): Since the question can contain Katakana that users answer with Hiragana and vice versa, it is necessary to convert all Katakana to Hiragana first. Their ASCII codes are fortunately aligned, so this is trivial.

Drawbacks This change prevents users from submitting a pointless answer to fail the question if the vocabulary question contains Kana. However, since there already is a “Don't know” button to submit an empty answer, this shouldn't be a big deal.

Moreover, it might confuse users who are not familiar with this feature.

voidzero commented 9 months ago

Pretty cool! I've felt the same way but never knew how to put it into words – regardless of the language 😁

dlsf commented 9 months ago

Just realized that this approach does not work with Katakana very well. If the question contains Katakana and you enter Hiragana, it would reject the answer. I will look into it later.

dlsf commented 9 months ago

Alright, I found a solution, even if it isn't as pretty as before. I took the easy route and converted all Katakana in both the answer and subject characters to Hiragana so they can be compared.

This requires the introduction of a new utility method, since the app currently only has a method for converting Hiragana to Katakana. It does the same, except in reverse and also for half-width Katakana. I tested it for every single Katakana character; it even works with half-width Katakana, odd Katakana combinations, as well as special characters such as ー (even if the latter is a bit hacky, see screenshot below).

Screenshot_20240218_164424

It works in all cases, but in most cases, it even works as you would expect: Screenshot_20240218_162729

dlsf commented 8 months ago

I have been doing the majority of my Wanikani reviews with this version of Smouldering Durtles for the past week, and would consider it well-tested at this point

dlsf commented 8 months ago

Alright, I found another edge case, which made me go through the entirety of Wanikani's 6000 vocabulary to find more. This should be all of them.

jerryhcooke commented 8 months ago

This is good work and well done for going through and doing all that work, however. I'm reticent to merge this without an off-by-default config setting created for it.

My reasoning being:

1) This differs from the Wanikani browser behaviour, and I don't want to present a default user experience that differs from Wanikani's handling to that large of a degree 2) Personally, I always use single character entry as a 'don't know' and so I don't want to remove that ability

jerryhcooke commented 8 months ago

Alright, I found another edge case, which made me go through the entirety of Wanikani's 6000 vocabulary to find more. This should be all of them.

Just in reference to this, how resilient is your method to future vocab additions by Tofugu?

dlsf commented 8 months ago

I'm reticent to merge this without an off-by-default config setting created for it.

Sure, I can look into that. In what category should it be placed, and what should the default value be?

Personally, I always use single character entry as a 'don't know' and so I don't want to remove that ability

It's the same for me, but I adjusted pretty quickly. I understand the concern though, a config option would be best.

Just in reference to this, how resilient is your method to future vocab additions by Tofugu?

I honestly can't provide a definite answer to this question, as I'm simply lacking knowledge about the Japanese language and am only level 22 on Wanikani. Considering that it works with all 6000+ vocabulary currently on Wanikani, the chances of it failing seem very slim as, to my knowledge, the number of characters that have an unusual reading are very low. 々 is considered a Kanji, ヶ being read as か in counters has been accounted for now (it only breaks if it is at the end or beginning of a word, which I should probably fix, although Wanikani currently has no such vocabulary), and 〜 as a special counter prefix in Wanikani is ignored. Unless there is a similarly special Japanese character I don't know about or Wanikani introduces another special character like 〜, I doubt that this will break.

dlsf commented 8 months ago

Alright, I had to make one final adjustment. It affected exactly one vocabulary on Wanikani (２０１１年), and I'm very lucky that I got the review today.

Allow me to explain how I missed this. I took the data from https://www.wkstats.com/items/wanikani, which allowed me to easily copy all Wanikani vocabulary separated by a comma for testing purposes. However, since I had to remove the text between the actual vocabulary items for each level, I forgot to put a comma between the first and last vocabulary items of the levels. ２０１１年 just so happened to be the first item of level 4, which interfered with my test. Since the word uses Japanese numerals, it expected Japanese numerals in the input as opposed to Hiragana.

I now fixed the error in the code & test, once more went through every single character manually and programmatically to confirm that only Regex (.+) and Hiragana remain for the check, and am now absolutely confident that this logic covers everything that's currently on Wanikani. I should have checked it programmatically from the beginning. My last remark regarding how future-proof this approach still holds true. Sorry for this mistake; the circumstances were quite unfortunate.

jerryhcooke commented 6 months ago

Hi @dlsf just to note that I'm thankful for the work you've done on this - I'm happy with the behaviour you've put in, but as previously mentioned I'd like to make it an advanced setting and off by default. I'm going to leave the PR open for the time being, just until I'm in a position to work on it and add the setting in.

dlsf commented 6 months ago

@jerryhcooke Thank you for the update! I was about to add a setting myself and let you review the change, but this also works. I'm still using this feature for my personal reviews, and there haven't been any issues in the past ~10 weeks (as expected).

jerryhcooke / smouldering_durtles

Check if kana answers contain the kana that are already part of the vocabulary #84