sschmidTU / mr-kanji-search-wtk

WTK-Search is a Kanji search engine using (multiple) Wanikani radicals or RTK names, on a RTK element dataset of 3000+ Kanji
https://sschmidtu.github.io/mr-kanji-search-wtk/
7 stars 2 forks source link

WK mode: predict radicals (e.g. replace tric with triceratops) #7

Open sschmidTU opened 3 years ago

sschmidTU commented 3 years ago

in RTK mode, partial input is no problem ("recl" will usually lead to the same result as "reclining"), but WK-specific radicals are currently only replaced when typed in full, and only then they give the desired result (usually), so "triceratop" will not find anything, but "triceratops" will.

-> build a system where each radical has a minimal matching regex, e.g. tric[a-z]* for triceratops.

sschmidTU commented 3 years ago

To still match (rtk) keywords while predicting WK radicals, the predicted radical can be an additional query. For example, foreh.* (regex) can be predicted as WK's forehead radical (crown in RTK), but we should still also find 額 forehead (RTK, in WK amount).

Often this is unambiguous though, like tsun.* being tsunami, but maybe we should still add it as an additional query to be safe, it will be hard to check what part of each radical will be unique across the whole dataset.

Of course this wouldn't be necessary if we had the whole data annotated with WK radicals directly.