mreichhoff / JapaneseGraph

A port of HanziGraph for Japanese
https://japanesegraph.com
3 stars 0 forks source link

Make it available for any language? #1

Open GrimPixel opened 2 years ago

GrimPixel commented 2 years ago

Just one program with different dictionary files using the morphemes as nodes, for any language.

mreichhoff commented 2 years ago

I tried to do something somewhat like that with another side project of mine: https://trielingual.com/ https://github.com/mreichhoff/TrieLingual

admittedly it's not morpheme-based, but it's somewhat similar. Unfortunately, the few most common words (especially articles and prepositions) end up dominating almost everything. I could filter those out, or switch to a morpheme approach, I suppose...

GrimPixel commented 2 years ago

I see. That is trouble.

How about having CJKV in one program with Han characters as the nodes? Similar to what this guy did: https://cjkv-dict.com/

Looks like your dictionaries are limited. There are still some licensed dictionaries that you can use, like CFDICT: https://polyglotclub.com/wiki/Language/Multiple-languages/Culture/Internet-Dictionaries

mreichhoff commented 2 years ago

I do have a Chinese version, and it probably does make sense to merge Chinese and Japanese into a single tool (whether with a language selector at the start or by combining the graph somehow). I'm not sure I see the value in including Korean and Vietnamese, but I know little about those two other than that they've had orthographic reforms and no longer use Han characters often (lamentable though the maintainer of cjkv seems to believe that is).

I do use CEDICT in the Chinese version, and JMDict in the Japanese version. I think any limitations in terms of coverage are more a result of my trimming those dictionaries (in the Chinese case, to the HSK1-6 or top10k word lists, though I'm working on a much larger wordset of ~50k of the most common words; in the Japanese case, I removed words that were less frequent than the top 20k by frequency or that didn't have kanji; I intend to expand that as well). Did you see other limitations? I could eventually add other base languages besides English, where CFDICT might come in, if I've understood its utility correctly.

GrimPixel commented 2 years ago

In North Korea and South Korea, even though the elites are trying to promote Hangul and suppress the use of Han script, teachings of Han characters are still present. Han characters are not essential for daily life, however, for many reputable professions like lawyer, they are inevitable. In South Korea, Han character efficiency examination is the key to those professions, so most parents are eager to let their children learn Han characters. In Vietnam, the situation is similar, but since the history of not using Han script is longer, no generation of people is good at it, people's desire to use Han script is minute. Through integrating these two languages, it helps those who know Chinese or Japanese and are learning Korean or Vietnamese, as well as those who know Korean or Vietnamese and want to learn about their etymology.

By the word “limited” I did mean use of dictionaries. 50k is actually the ceiling of most native speakers. I have also noticed that there can be TOCFL, JLPT, as well as TOPIK if Korean could be integrated.

mreichhoff commented 2 years ago

JLPT color coding is available in this repository already, I believe (unless that's a different JLPT).

Would your proposal to be a single interactive graph that would then show meanings and examples in each of Chinese, Japanese, Korean, and Vietnamese? I think that would be a fairly distant work item for me, though definitely one I'd eventually be interested in looking into. Combining the existing two languages (and maybe adding Cantonese) would be nearer though.

(also, either way, thanks for the feedback! Much appreciated!)

GrimPixel commented 2 years ago

I saw on the website, it's “Top1k”, “Top2k”, etc. so I thought if it were JLPT, it should be like HanziGraph: “JLPT N5”, “JLPT N4”, etc.

Thank you for your consideration. Of course, it should be step by step. If it's me, I would prefer using TSV files.

mreichhoff commented 2 years ago

Ah, the JLPT color coding is an option from the menu (the hamburger menu in the upper left); in the "color code based on:" dropdown, JLPT is available, and the legend should update when you pick a different choice (so the circles in the legend would say N5, N4, etc.)

GrimPixel commented 2 years ago

Good to know. I'll come back from time to time.