Closed dgisser closed 5 months ago
You did everything right (except setting max memory to 16000 GB :sweat_smile:). Words are likely not getting matched because wiktionary has diacritics on the headwords, and they aren't getting handled:
We'll need to add a case to the normalizeOrthography
function (like #67).
As for the skipped term tags/parts of speech, that's normal. The parts of speech don't matter unless/until there are deinflection rules written for that language. Adding tags to a tag_bank_term
controls whether they will remain in parentheses or be moved to a yomitan tag:
Here, anatomy
gets recognized and parsed out, the rest are left as-is. I'm not too happy with how the tags look in yomitan, may have been better to leave them all in parentheses. There are also some tags that are invisible on wiktionary, but kaikki deduces them somehow, these won't be shown in the yomitan dict unless they are add to a tag_bank_term
.
P.S. I remember reading this issue of yours back when the official policy in the yomitan readme was 'no other languages'. I might not have even tried to merge my fork with yomitan and do all this if it wasn't for that hint that there would be support for it, so thanks :pray:
Thanks!! Just copying the Russian normalizeOrthography
rule greatly improves the performance. Let me know if you would like me to submit a PR with these very minor changes. Also I'm amazed that you remember that issue in Korean no less! I'm so happy that Korean is available in Yomitan and it is so powerful; much better than any other chrome extension out there!
Feel free to PR, then Ukrainian dicts will be included automatically from the next release!
Also check out the language docs to properly add Ukrainian to Yomitan. Texts with no diacritics or full diacritics should work with these dicts, but you'll want to add the same diacritics processing to yomitan (like https://github.com/themoeway/yomitan/pull/1057) so texts with partial diacritics and other dicts will work.
Yeah, normally I would be really into doing something like that but I'm just doing this for a friend who is learning Ukrainian. I don't have any knowledge of Ukrainian (the most I can do is read the Russian alphabet and read a few basic words) so just getting a dictionary set up is sufficient for my needs.
Thanks for creating this project! I'm trying to add Ukrainian, here's what I got so far:
.env file
added
{"iso": "uk", "language": "Ukrainian", "flag": "🇺🇦"},
tolanguages.json
ran
./auto.sh Ukrainian English
This creates 2 zips, which if I put into Yomitan, suck. If you go to a random Ukrainian wiki page, very few of the words highlight, including words that are for sure in kaikki like критика.
We are skipping a ton of term tags, e.g.
etc. as well as skipped parts of speech
so maybe this is part of the problem. Look forward to any advice on how to resolve!